All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles
@ 2021-08-25  1:51 Neeraj K. Singh via GitGitGadget
  2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-08-25  1:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh

Git for Windows has had fsyncing of object files enabled since "409cae91eb
(mingw: change core.fsyncObjectFiles = 1 by default, 2017-09-04)".

There have been requests to make core.fsyncObjectFiles the default
everywhere, but there are concerns about its performance cost (perf results
below). There's a long and gory thread here:
https://lore.kernel.org/git/87a7xcw8sa.fsf@linux-m68k.org/t/.

My change introduces the new 'core.fsyncobjectFiles = 2' setting, which
batches the data-integrity FLUSH command sent to the disk across multiple
loose object files added to the object database.

We take advantage of the bulk-checkin hooks already in the add command and
add some hooks to the update-index (which is used internally by stash).
Details are in the last patch of the series.

Here's a simple performance test script:

    #!/bin/sh
    git clone https://github.com/nodejs/node.git node-repo-cache
    git clone node-repo-cache node-repo
    cd node-repo
    git --version
    
    find . -name "*.c" -exec sh -c 'echo foo1 >> $1' -- {} \;
    echo "----GIT stash fsync"
    time git -c core.fsyncObjectFiles=true stash push
    
    find . -name "*.c" -exec sh -c 'echo foo2 >> $1' -- {} \;
    echo "----GIT stash fsync_defer"
    time git -c core.fsyncObjectFiles=2 stash push
    
    find . -name "*.c" -exec sh -c 'echo foo3 >> $1' -- {} \;
    echo "----GIT stash no_fsync"
    time git -c core.fsyncObjectFiles=false stash push
    
    cd ..
    rm -r -f node-repo


Hardware:

 * Mac - Mac Mini 2018 running MacOS 11.5.1, APFS with a 1TB Apple NMVE SSD,
 * Linux - Ubuntu 20.04 - ext4 running on a Hyper-V VM with a fixed VHDX
   backed by a Samsung PM981.
 * Win - Windows NTFS - Same Hyper-V host as Linux. Operation | Mac | Linux
   | Windows

---------------- |---------|-------|---------- git fsync | 40.6 s | 7.8 s |
6.9s git fsync_defer | 6.5 s | 2.1 s | 3.8s git no_fsync | 1.7 s | 1.0 s |
2.6s

The windows version of git is slightly different:
https://github.com/git-for-windows/git/pull/3391. I also used a
Windows-specific test script.

I hope I'm CC'ing a reasonable set of people on this patch, based on the
last discussion.

Thanks, Neeraj Singh Windows Core File Systems.

Neeraj Singh (2):
  object-file: use futimes rather than utime
  core.fsyncobjectfiles: batch disk flushes

 Documentation/config/core.txt |  17 ++++--
 Makefile                      |   4 ++
 builtin/add.c                 |   3 +-
 builtin/update-index.c        |   3 +
 bulk-checkin.c                | 105 +++++++++++++++++++++++++++++++---
 bulk-checkin.h                |   4 +-
 compat/mingw.c                |  42 +++++++++-----
 compat/mingw.h                |   2 +
 config.c                      |   4 +-
 config.mak.uname              |   2 +
 configure.ac                  |   8 +++
 git-compat-util.h             |   7 +++
 object-file.c                 |  23 ++------
 wrapper.c                     |  36 ++++++++++++
 write-or-die.c                |   2 +-
 15 files changed, 213 insertions(+), 49 deletions(-)


base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v1
Pull-Request: https://github.com/git/git/pull/1076
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 1/2] object-file: use futimes rather than utime
  2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
@ 2021-08-25  1:51 ` Neeraj Singh via GitGitGadget
  2021-08-25 13:51   ` Johannes Schindelin
  2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-25  1:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Refactor the loose object file creation code and use the futimes(2) API
rather than utime. This should be slightly faster given that we already
have an FD to work with.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.c | 42 +++++++++++++++++++++++++++++-------------
 compat/mingw.h |  2 ++
 object-file.c  | 17 ++++++++---------
 3 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 9e0cd1e097f..948f4c3428b 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -949,19 +949,40 @@ int mingw_fstat(int fd, struct stat *buf)
 	}
 }
 
-static inline void time_t_to_filetime(time_t t, FILETIME *ft)
+static inline void timeval_to_filetime(const struct timeval *t, FILETIME *ft)
 {
-	long long winTime = t * 10000000LL + 116444736000000000LL;
+	long long winTime = t->tv_sec * 10000000LL + t->tv_usec * 10 + 116444736000000000LL;
 	ft->dwLowDateTime = winTime;
 	ft->dwHighDateTime = winTime >> 32;
 }
 
-int mingw_utime (const char *file_name, const struct utimbuf *times)
+int mingw_futimes(int fd, const struct timeval times[2])
 {
 	FILETIME mft, aft;
+
+	if (times) {
+		timeval_to_filetime(&times[0], &aft);
+		timeval_to_filetime(&times[1], &mft);
+	} else {
+		GetSystemTimeAsFileTime(&mft);
+		aft = mft;
+	}
+
+	if (!SetFileTime((HANDLE)_get_osfhandle(fd), NULL, &aft, &mft)) {
+		errno = EINVAL;
+		return -1;
+	}
+
+	return 0;
+}
+
+int mingw_utime (const char *file_name, const struct utimbuf *times)
+{
 	int fh, rc;
 	DWORD attrs;
 	wchar_t wfilename[MAX_PATH];
+	struct timeval tvs[2];
+
 	if (xutftowcs_path(wfilename, file_name) < 0)
 		return -1;
 
@@ -979,17 +1000,12 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
 	}
 
 	if (times) {
-		time_t_to_filetime(times->modtime, &mft);
-		time_t_to_filetime(times->actime, &aft);
-	} else {
-		GetSystemTimeAsFileTime(&mft);
-		aft = mft;
+		memset(tvs, 0, sizeof(tvs));
+		tvs[0].tv_sec = times->actime;
+		tvs[1].tv_sec = times->modtime;
 	}
-	if (!SetFileTime((HANDLE)_get_osfhandle(fh), NULL, &aft, &mft)) {
-		errno = EINVAL;
-		rc = -1;
-	} else
-		rc = 0;
+
+	rc = mingw_futimes(fh, times ? tvs : NULL);
 	close(fh);
 
 revert_attrs:
diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..1eb14edb2ed 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -398,6 +398,8 @@ int mingw_fstat(int fd, struct stat *buf);
 
 int mingw_utime(const char *file_name, const struct utimbuf *times);
 #define utime mingw_utime
+int mingw_futimes(int fd, const struct timeval times[2]);
+#define futimes mingw_futimes
 size_t mingw_strftime(char *s, size_t max,
 		   const char *format, const struct tm *tm);
 #define strftime mingw_strftime
diff --git a/object-file.c b/object-file.c
index a8be8994814..607e9e2f80b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1860,12 +1860,13 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_loose_object(int fd)
+static int close_loose_object(int fd, const char *tmpfile, const char *filename)
 {
 	if (fsync_object_files)
 		fsync_or_die(fd, "loose object file");
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
+	return finalize_object_file(tmpfile, filename);
 }
 
 /* Size of directory component, including the ending '/' */
@@ -1973,17 +1974,15 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
 
-	close_loose_object(fd);
-
 	if (mtime) {
-		struct utimbuf utb;
-		utb.actime = mtime;
-		utb.modtime = mtime;
-		if (utime(tmp_file.buf, &utb) < 0)
-			warning_errno(_("failed utime() on %s"), tmp_file.buf);
+		struct timeval tvs[2] = {0};
+		tvs[0].tv_sec = mtime;
+		tvs[1].tv_sec = mtime;
+		if (futimes(fd, tvs) < 0)
+			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
 	}
 
-	return finalize_object_file(tmp_file.buf, filename.buf);
+	return close_loose_object(fd, tmp_file.buf, filename.buf);
 }
 
 static int freshen_loose_object(const struct object_id *oid)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
@ 2021-08-25  1:51 ` Neeraj Singh via GitGitGadget
  2021-08-25  5:38   ` Christoph Hellwig
                     ` (2 more replies)
  2021-08-25 16:58 ` [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj Singh
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
  3 siblings, 3 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-25  1:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
MacOS, and Linux each offer mechanisms to write data from the filesystem
page cache without initiating a hardware flush.

This patch introduces a new 'core.fsyncObjectFiles = 2' option that
takes advantage of the bulk-checkin infrastructure to batch up hardware
flushes.

When the new mode is enabled we do the following for new objects:

1. Create a tmp_obj_XXXX file and write the object data to it.
2. Issue a pagecache writeback request and wait for it to complete.
3. Record the tmp name and the final name in the bulk-checkin state for
   later name.

At the end of the entire transaction we:
1. Issue a fsync against the lock file to flush the hardware writeback
   cache, which should by now have processed the tmp file writes.
2. Rename all of the temp files to their final names.
3. When updating the index and/or refs, we will issue another fsync
   internal to that operation.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the MacOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
MacOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt |  17 ++++--
 Makefile                      |   4 ++
 builtin/add.c                 |   3 +-
 builtin/update-index.c        |   3 +
 bulk-checkin.c                | 105 +++++++++++++++++++++++++++++++---
 bulk-checkin.h                |   4 +-
 config.c                      |   4 +-
 config.mak.uname              |   2 +
 configure.ac                  |   8 +++
 git-compat-util.h             |   7 +++
 object-file.c                 |  12 +---
 wrapper.c                     |  36 ++++++++++++
 write-or-die.c                |   2 +-
 13 files changed, 177 insertions(+), 30 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..3b672c2db67 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,17 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A boolean value or the number '2', indicating the level of durability
+	applied to object files.
++
+This setting controls how much effort Git makes to ensure that data added to
+the object store are durable in the case of an unclean system shutdown. If
+'false', Git allows data to remain in file system caches according to operating
+system policy, whence they may be lost if the system loses power or crashes. A
+value of 'true' instructs Git to force objects to stable storage immediately
+when they are added to the object store. The number '2' is an experimental
+value that also preserves durability but tries to perform hardware flushes in a
+batch.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 9573190f1d7..cb950ee43d3 100644
--- a/Makefile
+++ b/Makefile
@@ -1896,6 +1896,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/builtin/add.c b/builtin/add.c
index 09e684585d9..c58dfcd4bc3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -670,7 +670,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (chmod_arg && pathspec.nr)
 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
-	unplug_bulk_checkin();
+
+	unplug_bulk_checkin(&lock_file);
 
 finish:
 	if (write_locked_index(&the_index, &lock_file,
diff --git a/builtin/update-index.c b/builtin/update-index.c
index f1f16f2de52..64d025cf49e 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1152,6 +1153,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		struct strbuf unquoted = STRBUF_INIT;
 
 		setup_work_tree();
+		plug_bulk_checkin();
 		while (getline_fn(&buf, stdin) != EOF) {
 			char *p;
 			if (!nul_term_line && buf.buf[0] == '"') {
@@ -1166,6 +1168,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				chmod_path(set_executable_bit, p);
 			free(p);
 		}
+		unplug_bulk_checkin(&lock_file);
 		strbuf_release(&unquoted);
 		strbuf_release(&buf);
 	}
diff --git a/bulk-checkin.c b/bulk-checkin.c
index b023d9959aa..71004db863e 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,6 +3,7 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
@@ -10,6 +11,17 @@
 #include "packfile.h"
 #include "object-store.h"
 
+struct object_rename {
+	char *src;
+	char *dst;
+};
+
+static struct bulk_rename_state {
+	struct object_rename *renames;
+	uint32_t alloc_renames;
+	uint32_t nr_renames;
+} bulk_rename_state;
+
 static struct bulk_checkin_state {
 	unsigned plugged:1;
 
@@ -21,13 +33,15 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+
+} bulk_checkin_state;
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
 	struct object_id oid;
 	struct strbuf packname = STRBUF_INIT;
 	int i;
+	unsigned old_plugged;
 
 	if (!state->f)
 		return;
@@ -55,13 +69,42 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 
 clear_exit:
 	free(state->written);
+	old_plugged = state->plugged;
 	memset(state, 0, sizeof(*state));
+	state->plugged = old_plugged;
 
 	strbuf_release(&packname);
 	/* Make objects we just wrote available to ourselves */
 	reprepare_packed_git(the_repository);
 }
 
+static void do_sync_and_rename(struct bulk_rename_state *state, struct lock_file *lock_file)
+{
+	if (state->nr_renames) {
+		int i;
+
+		/*
+		 * Issue a full hardware flush against the lock file to ensure
+		 * that all objects are durable before any renames occur.
+		 * The code in fsync_and_close_loose_object_bulk_checkin has
+		 * already ensured that writeout has occurred, but it has not
+		 * flushed any writeback cache in the storage hardware.
+		 */
+		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
+
+		for (i = 0; i < state->nr_renames; i++) {
+			if (finalize_object_file(state->renames[i].src, state->renames[i].dst))
+				die_errno(_("could not rename '%s'"), state->renames[i].src);
+
+			free(state->renames[i].src);
+			free(state->renames[i].dst);
+		}
+
+		free(state->renames);
+		memset(state, 0, sizeof(*state));
+	}
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -256,25 +299,69 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+static void add_rename_bulk_checkin(struct bulk_rename_state *state,
+				    const char *src, const char *dst)
+{
+	struct object_rename *rename;
+
+	ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
+
+	rename = &state->renames[state->nr_renames++];
+	rename->src = xstrdup(src);
+	rename->dst = xstrdup(dst);
+}
+
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
+					      const char *filename)
+{
+	if (fsync_object_files) {
+		/*
+		 * If we have a plugged bulk checkin, we issue a call that
+		 * cleans the filesystem page cache but avoids a hardware flush
+		 * command. Later on we will issue a single hardware flush
+		 * before renaming files as part of do_sync_and_rename.
+		 */
+		if (bulk_checkin_state.plugged &&
+		    fsync_object_files == 2 &&
+		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+			add_rename_bulk_checkin(&bulk_rename_state, tmpfile, filename);
+			if (close(fd))
+				die_errno(_("error when closing loose object file"));
+
+			return 0;
+
+		} else {
+			fsync_or_die(fd, "loose object file");
+		}
+	}
+
+	if (close(fd))
+		die_errno(_("error when closing loose object file"));
+
+	return finalize_object_file(tmpfile, filename);
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_state.plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	bulk_checkin_state.plugged = 1;
 }
 
-void unplug_bulk_checkin(void)
+void unplug_bulk_checkin(struct lock_file *lock_file)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	bulk_checkin_state.plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_sync_and_rename(&bulk_rename_state, lock_file);
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..8efb01ed669 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,11 +6,13 @@
 
 #include "cache.h"
 
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, const char *filename);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
 
 void plug_bulk_checkin(void);
-void unplug_bulk_checkin(void);
+void unplug_bulk_checkin(struct lock_file *);
 
 #endif
diff --git a/config.c b/config.c
index f33abeab851..375bdb24b0a 100644
--- a/config.c
+++ b/config.c
@@ -1509,7 +1509,9 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		int is_bool;
+
+		fsync_object_files = git_config_bool_or_int(var, value, &is_bool);
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 69413fb3dc0..8c07f2265a8 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
@@ -133,6 +134,7 @@ ifeq ($(uname_S),Darwin)
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 	BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1
+	BASIC_CFLAGS += -DFSYNC_DOESNT_FLUSH=1
 	HAVE_BSD_SYSCTL = YesPlease
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/git-compat-util.h b/git-compat-util.h
index b46605300ab..d14e2436276 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index 607e9e2f80b..5f04143dde0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1859,16 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 	return 0;
 }
 
-/* Finalize a file on disk, and close it. */
-static int close_loose_object(int fd, const char *tmpfile, const char *filename)
-{
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
-	if (close(fd) != 0)
-		die_errno(_("error when closing loose object file"));
-	return finalize_object_file(tmpfile, filename);
-}
-
 /* Size of directory component, including the ending '/' */
 static inline int directory_size(const char *filename)
 {
@@ -1982,7 +1972,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
 	}
 
-	return close_loose_object(fd, tmp_file.buf, filename.buf);
+	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, filename.buf);
 }
 
 static int freshen_loose_object(const struct object_id *oid)
diff --git a/wrapper.c b/wrapper.c
index 563ad590df1..37a8b61a7df 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -538,6 +538,42 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	if (action == FSYNC_WRITEOUT_ONLY) {
+#ifdef __APPLE__
+		/*
+		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+	}
+
+#ifdef __APPLE__
+	return fcntl(fd, F_FULLFSYNC);
+#else
+	return fsync(fd);
+#endif
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index d33e68f6abb..8f53953d4ab 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
@ 2021-08-25  5:38   ` Christoph Hellwig
  2021-08-25 17:40     ` Neeraj Singh
  2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
  2021-08-25 18:52   ` Johannes Schindelin
  2 siblings, 1 reply; 160+ messages in thread
From: Christoph Hellwig @ 2021-08-25  5:38 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Wed, Aug 25, 2021 at 01:51:32AM +0000, Neeraj Singh via GitGitGadget wrote:
> From: Neeraj Singh <neerajsi@microsoft.com>
> 
> When adding many objects to a repo with core.fsyncObjectFiles set to
> true, the cost of fsync'ing each object file can become prohibitive.
> 
> One major source of the cost of fsync is the implied flush of the
> hardware writeback cache within the disk drive. Fortunately, Windows,
> MacOS, and Linux each offer mechanisms to write data from the filesystem
> page cache without initiating a hardware flush.
> 
> This patch introduces a new 'core.fsyncObjectFiles = 2' option that
> takes advantage of the bulk-checkin infrastructure to batch up hardware
> flushes.

Another interesting way to flush on linux would be the syncfs call,
which syncs all files on a file system.  Once you write more than
handful or two of files that tends to win out over a batch of fsync
calls.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 1/2] object-file: use futimes rather than utime
  2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
@ 2021-08-25 13:51   ` Johannes Schindelin
  2021-08-25 22:08     ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Johannes Schindelin @ 2021-08-25 13:51 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

Hi Neeraj,

Thank you so much for this patch series! Overall, I am very happy with the
direction this is going.

I will offer a couple of suggestions below, inlined.

On Wed, 25 Aug 2021, Neeraj Singh via GitGitGadget wrote:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> Refactor the loose object file creation code and use the futimes(2) API
> rather than utime. This should be slightly faster given that we already
> have an FD to work with.

If I were you, I would spell out "file descriptor" here.

>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  compat/mingw.c | 42 +++++++++++++++++++++++++++++-------------
>  compat/mingw.h |  2 ++
>  object-file.c  | 17 ++++++++---------
>  3 files changed, 39 insertions(+), 22 deletions(-)
>
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 9e0cd1e097f..948f4c3428b 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -949,19 +949,40 @@ int mingw_fstat(int fd, struct stat *buf)
>  	}
>  }
>
> -static inline void time_t_to_filetime(time_t t, FILETIME *ft)
> +static inline void timeval_to_filetime(const struct timeval *t, FILETIME *ft)
>  {
> -	long long winTime = t * 10000000LL + 116444736000000000LL;
> +	long long winTime = t->tv_sec * 10000000LL + t->tv_usec * 10 + 116444736000000000LL;

Technically, this is a change in behavior, right? We did not use to use
nanosecond precision. But I don't think that we actually make use of this
in this patch.

>  	ft->dwLowDateTime = winTime;
>  	ft->dwHighDateTime = winTime >> 32;
>  }
>
> -int mingw_utime (const char *file_name, const struct utimbuf *times)
> +int mingw_futimes(int fd, const struct timeval times[2])

At first, I wondered whether it would make sense to pass the access time
and the modified time separately, as pointers. I don't think that we pass
around arrays as function parameters in Git anywhere else.

But then I realized that `futimes()` is available in this precise form on
Linux and on the BSDs. Therefore, it is not up to us to decide the
function's signature.

However, now that I looked at the manual page, I noticed that this
function is not part of any POSIX standard.

Which makes me think that we will have to do a bit more than just define
it on Windows: we will have to introduce a `Makefile` knob (just like you
did with `HAVE_SYNC_FILE_RANGE` in patch 2/2) and set that specifically
for Linux and the BSDs, and use `futimes()` only if it is available
(otherwise fall back to `utime()`).

Then, as a separate patch, we should introduce this Windows-specific shim
and declare that it is available via `config.mak.uname`.

I am a _huge_ fan of patches that are so clear and obvious that bugs have
a hard time creeping in without being spotted immediately. And I think
that this organization would help achieve this goal.

>  {
>  	FILETIME mft, aft;
> +
> +	if (times) {
> +		timeval_to_filetime(&times[0], &aft);
> +		timeval_to_filetime(&times[1], &mft);
> +	} else {
> +		GetSystemTimeAsFileTime(&mft);
> +		aft = mft;
> +	}
> +
> +	if (!SetFileTime((HANDLE)_get_osfhandle(fd), NULL, &aft, &mft)) {
> +		errno = EINVAL;
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +int mingw_utime (const char *file_name, const struct utimbuf *times)

Please lose the space between the function name and the opening
parenthesis. I know, the preimage of this diff has it, but that was an
oversight and definitely disagrees with our current coding style.

> +{
>  	int fh, rc;
>  	DWORD attrs;
>  	wchar_t wfilename[MAX_PATH];
> +	struct timeval tvs[2];
> +
>  	if (xutftowcs_path(wfilename, file_name) < 0)
>  		return -1;
>
> @@ -979,17 +1000,12 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
>  	}
>
>  	if (times) {
> -		time_t_to_filetime(times->modtime, &mft);
> -		time_t_to_filetime(times->actime, &aft);
> -	} else {
> -		GetSystemTimeAsFileTime(&mft);
> -		aft = mft;
> +		memset(tvs, 0, sizeof(tvs));
> +		tvs[0].tv_sec = times->actime;
> +		tvs[1].tv_sec = times->modtime;

It is too bad that we have to copy around those values just to convert
them, but I cannot think of any better way, either. And it's not like
we're in a hot loop: this code will be dominated by I/O anyways.

>  	}
> -	if (!SetFileTime((HANDLE)_get_osfhandle(fh), NULL, &aft, &mft)) {
> -		errno = EINVAL;
> -		rc = -1;
> -	} else
> -		rc = 0;
> +
> +	rc = mingw_futimes(fh, times ? tvs : NULL);
>  	close(fh);
>
>  revert_attrs:
> diff --git a/compat/mingw.h b/compat/mingw.h
> index c9a52ad64a6..1eb14edb2ed 100644
> --- a/compat/mingw.h
> +++ b/compat/mingw.h
> @@ -398,6 +398,8 @@ int mingw_fstat(int fd, struct stat *buf);
>
>  int mingw_utime(const char *file_name, const struct utimbuf *times);
>  #define utime mingw_utime
> +int mingw_futimes(int fd, const struct timeval times[2]);
> +#define futimes mingw_futimes
>  size_t mingw_strftime(char *s, size_t max,
>  		   const char *format, const struct tm *tm);
>  #define strftime mingw_strftime
> diff --git a/object-file.c b/object-file.c
> index a8be8994814..607e9e2f80b 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1860,12 +1860,13 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>  }
>
>  /* Finalize a file on disk, and close it. */
> -static void close_loose_object(int fd)
> +static int close_loose_object(int fd, const char *tmpfile, const char *filename)
>  {
>  	if (fsync_object_files)
>  		fsync_or_die(fd, "loose object file");
>  	if (close(fd) != 0)
>  		die_errno(_("error when closing loose object file"));
> +	return finalize_object_file(tmpfile, filename);

While this is a clear change of behavior, this function has only one
caller, and that caller is adjusted accordingly.

Could you add this clarification of context to the commit message? I know
it will help me in the future, when I have to get up to speed again by
reading the commit history.

Thank you,
Johannes

>  }
>
>  /* Size of directory component, including the ending '/' */
> @@ -1973,17 +1974,15 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  		die(_("confused by unstable object source data for %s"),
>  		    oid_to_hex(oid));
>
> -	close_loose_object(fd);
> -
>  	if (mtime) {
> -		struct utimbuf utb;
> -		utb.actime = mtime;
> -		utb.modtime = mtime;
> -		if (utime(tmp_file.buf, &utb) < 0)
> -			warning_errno(_("failed utime() on %s"), tmp_file.buf);
> +		struct timeval tvs[2] = {0};
> +		tvs[0].tv_sec = mtime;
> +		tvs[1].tv_sec = mtime;
> +		if (futimes(fd, tvs) < 0)
> +			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
>  	}
>
> -	return finalize_object_file(tmp_file.buf, filename.buf);
> +	return close_loose_object(fd, tmp_file.buf, filename.buf);
>  }
>
>  static int freshen_loose_object(const struct object_id *oid)
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
  2021-08-25  5:38   ` Christoph Hellwig
@ 2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
  2021-08-26  0:49     ` Neeraj Singh
  2021-08-26  5:57     ` Christoph Hellwig
  2021-08-25 18:52   ` Johannes Schindelin
  2 siblings, 2 replies; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-08-25 16:11 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Neeraj Singh


On Wed, Aug 25 2021, Neeraj Singh via GitGitGadget wrote:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> When adding many objects to a repo with core.fsyncObjectFiles set to
> true, the cost of fsync'ing each object file can become prohibitive.
>
> One major source of the cost of fsync is the implied flush of the
> hardware writeback cache within the disk drive. Fortunately, Windows,
> MacOS, and Linux each offer mechanisms to write data from the filesystem
> page cache without initiating a hardware flush.
>
> This patch introduces a new 'core.fsyncObjectFiles = 2' option that
> takes advantage of the bulk-checkin infrastructure to batch up hardware
> flushes.
>
> When the new mode is enabled we do the following for new objects:
>
> 1. Create a tmp_obj_XXXX file and write the object data to it.
> 2. Issue a pagecache writeback request and wait for it to complete.
> 3. Record the tmp name and the final name in the bulk-checkin state for
>    later name.
>
> At the end of the entire transaction we:
> 1. Issue a fsync against the lock file to flush the hardware writeback
>    cache, which should by now have processed the tmp file writes.
> 2. Rename all of the temp files to their final names.
> 3. When updating the index and/or refs, we will issue another fsync
>    internal to that operation.
>
> On a filesystem with a singular journal that is updated during name
> operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
> would expect the fsync to trigger a journal writeout so that this
> sequence is enough to ensure that the user's data is durable by the time
> the git command returns.
>
> This change also updates the MacOS code to trigger a real hardware flush
> via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
> MacOS there was no guarantee of durability since a simple fsync(2) call
> does not flush any hardware caches.

Thanks for working on this, good to see fsck issues picked up after some
on-list pause.

> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index c04f62a54a1..3b672c2db67 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -548,12 +548,17 @@ core.whitespace::
>    errors. The default tab width is 8. Allowed values are 1 to 63.
>  
>  core.fsyncObjectFiles::
> -	This boolean will enable 'fsync()' when writing object files.
> -+
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +	A boolean value or the number '2', indicating the level of durability
> +	applied to object files.
> ++
> +This setting controls how much effort Git makes to ensure that data added to
> +the object store are durable in the case of an unclean system shutdown. If
> +'false', Git allows data to remain in file system caches according to operating
> +system policy, whence they may be lost if the system loses power or crashes. A
> +value of 'true' instructs Git to force objects to stable storage immediately
> +when they are added to the object store. The number '2' is an experimental
> +value that also preserves durability but tries to perform hardware flushes in a
> +batch.

Some feedback/thoughts:

0) Let's not expose "2" to users, but give it some friendly config name
and just translate this to the enum internally.

1) Your commit message says "When updating the index and/or refs[...]"
but we're changing core.fsyncObjectFiles here, I assume that's
summarizing existing behavior then

2) You say "when adding many [loose] objects to a repo[...]", and the
test-case is "git stash push", but for e.g. accepting pushes we have
transfer.unpackLimit.

It would be interesting to see if/how this impacts performance there,
and also if not that should at least be called out in
documentation. I.e. users might want to set this differently on servers
v.s. checkouts.

But also, is this sort of thing something we could mitigate even more in
commands like "git stash push" by just writing a pack instead of N loose
objects?

I don't think such questions should preclude changing the fsync
approach, or offering more options, but they should inform our
longer-term goals.

3) Re some of the musings about fsync() recently in
https://lore.kernel.org/git/877dhs20x3.fsf@evledraar.gmail.com/; is this
method of doing not-quite-an-fsync guaranteed by some OS's / POSIX etc,
or is it more like the initial approach before core.fsyncObjectFiles,
i.e. the happy-go-lucky approach described in the "[...]that orders data
writes properly[...]" documentation you're removing.

4) While that documentation written by Linus long ago is rather
flippant, I think just removing it and not replacing it with some
discussion about how this is a trade-off v.s. real-world filesystem
semantics isn't a good change.

5) On a similar thought as transfer.unpackLimit in #2, I wonder if this
fsync() setting shouldn't really be something we should be splitting
up. I.e. maybe handle batch loose object writes one way, ref updates
another way etc. I think moving core.fsync* to a setting like what we
have for fsck.* and <cmd>.fsck.* is probably a better thing to do in the
longer term.

I.e. being able to do things like:

    fsync.objectFiles = none
    fsync.refFiles = cache # or "hardware"
    receive.fsync.objectFiles = hardware
    receive.fsync.refFiles = hardware

Or whatever, i.e. we're using one hammer for all of these now, but I
suspect most users who care about fsync care about /some/ fsync, not
everything.

6) Inline comments below.

> +struct object_rename {
> +	char *src;
> +	char *dst;
> +};
> +
> +static struct bulk_rename_state {
> +	struct object_rename *renames;
> +	uint32_t alloc_renames;
> +	uint32_t nr_renames;
> +} bulk_rename_state;

In a crash of git itself it seems we're going to leave some litter
behind in the object dir now, and "git gc" won't know how to clean it
up. I think this is going to want to just use the tmp-objdir.[ch] API,
which might or might not need to be extended for loose objects / some
oddities of this use-case.

Also, if you have a pair of things like this the string-list API is much
more pleasing to use than coming up with your own encapsulation.

>  static struct bulk_checkin_state {
>  	unsigned plugged:1;
>  
> @@ -21,13 +33,15 @@ static struct bulk_checkin_state {
>  	struct pack_idx_entry **written;
>  	uint32_t alloc_written;
>  	uint32_t nr_written;
> -} state;


> +
> +		free(state->renames);
> +		memset(state, 0, sizeof(*state));

So with this and other use of the "state" variable is this part of
bulk-checkin going to become thread-unsafe, was that already the case?

> +static void add_rename_bulk_checkin(struct bulk_rename_state *state,
> +				    const char *src, const char *dst)
> +{
> +	struct object_rename *rename;
> +
> +	ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
> +
> +	rename = &state->renames[state->nr_renames++];
> +	rename->src = xstrdup(src);
> +	rename->dst = xstrdup(dst);
> +}

All boilerplate duplicating things you'd get with a string-list for free...

> +		/*
> +		 * If we have a plugged bulk checkin, we issue a call that
> +		 * cleans the filesystem page cache but avoids a hardware flush
> +		 * command. Later on we will issue a single hardware flush
> +		 * before renaming files as part of do_sync_and_rename.
> +		 */

So this is the sort of thing I meant by extending Linus's docs, I know
some FS's work this way, but not all do.

Also there's no guarantee in git that your .git is on one FS, so I think
even for the FS's you have in mind this might not be an absolute
guarantee...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles
  2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
  2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
@ 2021-08-25 16:58 ` Neeraj Singh
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
  3 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-08-25 16:58 UTC (permalink / raw)
  To: Neeraj K. Singh via GitGitGadget
  Cc: Git List, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh

On Tue, Aug 24, 2021 at 6:51 PM Neeraj K. Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Hardware:
>
>  * Mac - Mac Mini 2018 running MacOS 11.5.1, APFS with a 1TB Apple NMVE SSD,
>  * Linux - Ubuntu 20.04 - ext4 running on a Hyper-V VM with a fixed VHDX
>    backed by a Samsung PM981.
>  * Win - Windows NTFS - Same Hyper-V host as Linux. Operation | Mac | Linux
>    | Windows
>
> ---------------- |---------|-------|---------- git fsync | 40.6 s | 7.8 s |
> 6.9s git fsync_defer | 6.5 s | 2.1 s | 3.8s git no_fsync | 1.7 s | 1.0 s |
> 2.6s
>
I just wanted to fix this performance test table so that it is readable.
Operation       | Mac     | Linux | Windows
----------------|---------|-------|----------
git fsync       | 40.6 s  | 7.8 s | 6.9 s
git fsync_defer | 6.5 s   | 2.1 s | 3.8 s
git no_fsync    | 1.7 s   | 1.0 s | 2.6 s

Here's the graphical version:
https://docs.google.com/spreadsheets/d/18HWXSUVAVqqKATsuVvgxDF6ftX_5qG1UGgNjGtwOuu8/edit?usp=sharing

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25  5:38   ` Christoph Hellwig
@ 2021-08-25 17:40     ` Neeraj Singh
  2021-08-26  5:54       ` Christoph Hellwig
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-08-25 17:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Tue, Aug 24, 2021 at 10:38 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 25, 2021 at 01:51:32AM +0000, Neeraj Singh via GitGitGadget wrote:
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > When adding many objects to a repo with core.fsyncObjectFiles set to
> > true, the cost of fsync'ing each object file can become prohibitive.
> >
> > One major source of the cost of fsync is the implied flush of the
> > hardware writeback cache within the disk drive. Fortunately, Windows,
> > MacOS, and Linux each offer mechanisms to write data from the filesystem
> > page cache without initiating a hardware flush.
> >
> > This patch introduces a new 'core.fsyncObjectFiles = 2' option that
> > takes advantage of the bulk-checkin infrastructure to batch up hardware
> > flushes.
>
> Another interesting way to flush on linux would be the syncfs call,
> which syncs all files on a file system.  Once you write more than
> handful or two of files that tends to win out over a batch of fsync
> calls.

I'd expect syncfs to suffer from the noisy-neighbor problem that Linus
alluded to on the big
thread you kicked off.  The equivalent call on Windows currently
requires administrative
privileges (I'm really not sure exactly why, perhaps we should change that).

If someone adds a more targeted bulk sync interface to the Linux
kernel, I'm sure Git could be
changed to use it. Maybe an fcntl(2) interface that initiates
writeback and registers completion with an
eventfd.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
  2021-08-25  5:38   ` Christoph Hellwig
  2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
@ 2021-08-25 18:52   ` Johannes Schindelin
  2021-08-25 21:26     ` Junio C Hamano
  2021-08-26  1:19     ` Neeraj Singh
  2 siblings, 2 replies; 160+ messages in thread
From: Johannes Schindelin @ 2021-08-25 18:52 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

Hi Neeraj,

continuing my review here, inlined.

On Wed, 25 Aug 2021, Neeraj Singh via GitGitGadget wrote:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> When adding many objects to a repo with core.fsyncObjectFiles set to
> true, the cost of fsync'ing each object file can become prohibitive.
>
> One major source of the cost of fsync is the implied flush of the
> hardware writeback cache within the disk drive. Fortunately, Windows,
> MacOS, and Linux each offer mechanisms to write data from the filesystem
> page cache without initiating a hardware flush.
>
> This patch introduces a new 'core.fsyncObjectFiles = 2' option that
> takes advantage of the bulk-checkin infrastructure to batch up hardware
> flushes.

It makes sense, but I would recommend using a more easily explained value
than `2`. Maybe `delayed`? Or `bulk` or `batched`?

The way this would be implemented would look somewhat like the
implementation for `core.abbrev`, which also accepts a string ("auto") or
a Boolean (or even an integral number), see
https://github.com/git/git/blob/v2.33.0/config.c#L1367-L1381:

	if (!strcmp(var, "core.abbrev")) {
		if (!value)
			return config_error_nonbool(var);
		if (!strcasecmp(value, "auto"))
			default_abbrev = -1;
		else if (!git_parse_maybe_bool_text(value))
			default_abbrev = the_hash_algo->hexsz;
		else {
			int abbrev = git_config_int(var, value);
			if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
				return error(_("abbrev length out of range: %d"), abbrev);
			default_abbrev = abbrev;
		}
		return 0;
	}

> When the new mode is enabled we do the following for new objects:
>
> 1. Create a tmp_obj_XXXX file and write the object data to it.
> 2. Issue a pagecache writeback request and wait for it to complete.
> 3. Record the tmp name and the final name in the bulk-checkin state for
>    later name.
>
> At the end of the entire transaction we:
> 1. Issue a fsync against the lock file to flush the hardware writeback
>    cache, which should by now have processed the tmp file writes.
> 2. Rename all of the temp files to their final names.
> 3. When updating the index and/or refs, we will issue another fsync
>    internal to that operation.
>
> On a filesystem with a singular journal that is updated during name
> operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
> would expect the fsync to trigger a journal writeout so that this
> sequence is enough to ensure that the user's data is durable by the time
> the git command returns.
>
> This change also updates the MacOS code to trigger a real hardware flush
> via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
> MacOS there was no guarantee of durability since a simple fsync(2) call
> does not flush any hardware caches.

You included a very nice table with performance numbers in the cover
letter. Maybe include that here, in the commit message?

> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  Documentation/config/core.txt |  17 ++++--
>  Makefile                      |   4 ++
>  builtin/add.c                 |   3 +-
>  builtin/update-index.c        |   3 +
>  bulk-checkin.c                | 105 +++++++++++++++++++++++++++++++---
>  bulk-checkin.h                |   4 +-
>  config.c                      |   4 +-
>  config.mak.uname              |   2 +
>  configure.ac                  |   8 +++
>  git-compat-util.h             |   7 +++
>  object-file.c                 |  12 +---
>  wrapper.c                     |  36 ++++++++++++
>  write-or-die.c                |   2 +-
>  13 files changed, 177 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index c04f62a54a1..3b672c2db67 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -548,12 +548,17 @@ core.whitespace::
>    errors. The default tab width is 8. Allowed values are 1 to 63.
>
>  core.fsyncObjectFiles::
> -	This boolean will enable 'fsync()' when writing object files.
> -+
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +	A boolean value or the number '2', indicating the level of durability
> +	applied to object files.
> ++
> +This setting controls how much effort Git makes to ensure that data added to
> +the object store are durable in the case of an unclean system shutdown. If

In addition to the content, I also like a lot that this tempers down the
language to be a lot more agreeable to read.

> +'false', Git allows data to remain in file system caches according to operating
> +system policy, whence they may be lost if the system loses power or crashes. A
> +value of 'true' instructs Git to force objects to stable storage immediately
> +when they are added to the object store. The number '2' is an experimental
> +value that also preserves durability but tries to perform hardware flushes in a
> +batch.
>
>  core.preloadIndex::
>  	Enable parallel index preload for operations like 'git diff'
> diff --git a/Makefile b/Makefile
> index 9573190f1d7..cb950ee43d3 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1896,6 +1896,10 @@ ifdef HAVE_CLOCK_MONOTONIC
>  	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
>  endif
>
> +ifdef HAVE_SYNC_FILE_RANGE
> +	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
> +endif
> +
>  ifdef NEEDS_LIBRT
>  	EXTLIBS += -lrt
>  endif
> diff --git a/builtin/add.c b/builtin/add.c
> index 09e684585d9..c58dfcd4bc3 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -670,7 +670,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>
>  	if (chmod_arg && pathspec.nr)
>  		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
> -	unplug_bulk_checkin();
> +
> +	unplug_bulk_checkin(&lock_file);
>
>  finish:
>  	if (write_locked_index(&the_index, &lock_file,
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index f1f16f2de52..64d025cf49e 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -5,6 +5,7 @@
>   */
>  #define USE_THE_INDEX_COMPATIBILITY_MACROS
>  #include "cache.h"
> +#include "bulk-checkin.h"
>  #include "config.h"
>  #include "lockfile.h"
>  #include "quote.h"
> @@ -1152,6 +1153,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>  		struct strbuf unquoted = STRBUF_INIT;
>
>  		setup_work_tree();
> +		plug_bulk_checkin();
>  		while (getline_fn(&buf, stdin) != EOF) {
>  			char *p;
>  			if (!nul_term_line && buf.buf[0] == '"') {
> @@ -1166,6 +1168,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>  				chmod_path(set_executable_bit, p);
>  			free(p);
>  		}
> +		unplug_bulk_checkin(&lock_file);
>  		strbuf_release(&unquoted);
>  		strbuf_release(&buf);
>  	}

This change to `cmd_update_index()`, would it make sense to separate it
out into its own commit? I think it would, as it is a slight change of
behavior of the `--stdin` mode, no?

> diff --git a/bulk-checkin.c b/bulk-checkin.c
> index b023d9959aa..71004db863e 100644
> --- a/bulk-checkin.c
> +++ b/bulk-checkin.c
> @@ -3,6 +3,7 @@
>   */
>  #include "cache.h"
>  #include "bulk-checkin.h"
> +#include "lockfile.h"
>  #include "repository.h"
>  #include "csum-file.h"
>  #include "pack.h"
> @@ -10,6 +11,17 @@
>  #include "packfile.h"
>  #include "object-store.h"
>
> +struct object_rename {
> +	char *src;
> +	char *dst;
> +};
> +
> +static struct bulk_rename_state {
> +	struct object_rename *renames;
> +	uint32_t alloc_renames;
> +	uint32_t nr_renames;
> +} bulk_rename_state;
> +
>  static struct bulk_checkin_state {
>  	unsigned plugged:1;
>
> @@ -21,13 +33,15 @@ static struct bulk_checkin_state {
>  	struct pack_idx_entry **written;
>  	uint32_t alloc_written;
>  	uint32_t nr_written;
> -} state;
> +
> +} bulk_checkin_state;

While it definitely looks better after this patch, having the new code
_and_ the rename in the same set of changes makes it a bit harder to
review and to spot bugs.

Could I ask you to split this rename out into its own, preparatory patch
("preparatory" meaning that it should be ordered before the patch that
adds support for the new fsync mode)?

>
>  static void finish_bulk_checkin(struct bulk_checkin_state *state)
>  {
>  	struct object_id oid;
>  	struct strbuf packname = STRBUF_INIT;
>  	int i;
> +	unsigned old_plugged;

Since this variable is designed to hold the value of the `plugged` field
of the `bulk_checkin_state`, which is declared as `unsigned plugged:1;`,
we probably want a `:1` here, too.

Also: is it really "old", rather than "orig"? I would have expected the
name `orig_plugged` or `save_plugged`.

>
>  	if (!state->f)
>  		return;
> @@ -55,13 +69,42 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
>
>  clear_exit:
>  	free(state->written);
> +	old_plugged = state->plugged;
>  	memset(state, 0, sizeof(*state));
> +	state->plugged = old_plugged;

Unfortunately, I lack the context to understand the purpose of this. Is
the idea that `plugged` gives an indication whether we're still within
that batch that should be fsync'ed all at once?

I only see one caller where this would make a difference, and that caller
is `deflate_to_pack()`. Maybe we should just start that function with
`unsigned save_plugged:1 = state->plugged;` and restore it after the
`while (1)` loop?

>
>  	strbuf_release(&packname);
>  	/* Make objects we just wrote available to ourselves */
>  	reprepare_packed_git(the_repository);
>  }
>
> +static void do_sync_and_rename(struct bulk_rename_state *state, struct lock_file *lock_file)
> +{
> +	if (state->nr_renames) {
> +		int i;
> +
> +		/*
> +		 * Issue a full hardware flush against the lock file to ensure
> +		 * that all objects are durable before any renames occur.
> +		 * The code in fsync_and_close_loose_object_bulk_checkin has
> +		 * already ensured that writeout has occurred, but it has not
> +		 * flushed any writeback cache in the storage hardware.
> +		 */
> +		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
> +
> +		for (i = 0; i < state->nr_renames; i++) {
> +			if (finalize_object_file(state->renames[i].src, state->renames[i].dst))
> +				die_errno(_("could not rename '%s'"), state->renames[i].src);
> +
> +			free(state->renames[i].src);
> +			free(state->renames[i].dst);
> +		}
> +
> +		free(state->renames);
> +		memset(state, 0, sizeof(*state));

Hmm. There is a lot of `memset()`ing going on, and I am not quite sure
that I like what I am seeing. It does not help that there are now two very
easily-confused structs: `bulk_rename_state` and `bulk_checkin_state`.
Which made me worried at first that we might be resetting the `renames`
field inadvertently in `finish_bulk_checkin()`.

Maybe we can do this instead?

		FREE_AND_NULL(state->renames);
		state->nr_renames = state->alloc_renames = 0;

> +	}
> +}
> +
>  static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
>  {
>  	int i;
> @@ -256,25 +299,69 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
>  	return 0;
>  }
>
> +static void add_rename_bulk_checkin(struct bulk_rename_state *state,
> +				    const char *src, const char *dst)
> +{
> +	struct object_rename *rename;
> +
> +	ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
> +
> +	rename = &state->renames[state->nr_renames++];
> +	rename->src = xstrdup(src);
> +	rename->dst = xstrdup(dst);
> +}
> +
> +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
> +					      const char *filename)
> +{
> +	if (fsync_object_files) {
> +		/*
> +		 * If we have a plugged bulk checkin, we issue a call that
> +		 * cleans the filesystem page cache but avoids a hardware flush
> +		 * command. Later on we will issue a single hardware flush
> +		 * before renaming files as part of do_sync_and_rename.
> +		 */
> +		if (bulk_checkin_state.plugged &&
> +		    fsync_object_files == 2 &&
> +		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
> +			add_rename_bulk_checkin(&bulk_rename_state, tmpfile, filename);
> +			if (close(fd))
> +				die_errno(_("error when closing loose object file"));
> +
> +			return 0;
> +
> +		} else {
> +			fsync_or_die(fd, "loose object file");
> +		}
> +	}
> +
> +	if (close(fd))
> +		die_errno(_("error when closing loose object file"));
> +
> +	return finalize_object_file(tmpfile, filename);
> +}
> +
>  int index_bulk_checkin(struct object_id *oid,
>  		       int fd, size_t size, enum object_type type,
>  		       const char *path, unsigned flags)
>  {
> -	int status = deflate_to_pack(&state, oid, fd, size, type,
> +	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
>  				     path, flags);
> -	if (!state.plugged)
> -		finish_bulk_checkin(&state);
> +	if (!bulk_checkin_state.plugged)
> +		finish_bulk_checkin(&bulk_checkin_state);
>  	return status;
>  }
>
>  void plug_bulk_checkin(void)
>  {
> -	state.plugged = 1;
> +	bulk_checkin_state.plugged = 1;
>  }
>
> -void unplug_bulk_checkin(void)
> +void unplug_bulk_checkin(struct lock_file *lock_file)
>  {
> -	state.plugged = 0;
> -	if (state.f)
> -		finish_bulk_checkin(&state);
> +	bulk_checkin_state.plugged = 0;
> +	if (bulk_checkin_state.f)
> +		finish_bulk_checkin(&bulk_checkin_state);
> +
> +	do_sync_and_rename(&bulk_rename_state, lock_file);
>  }
> diff --git a/bulk-checkin.h b/bulk-checkin.h
> index b26f3dc3b74..8efb01ed669 100644
> --- a/bulk-checkin.h
> +++ b/bulk-checkin.h
> @@ -6,11 +6,13 @@
>
>  #include "cache.h"
>
> +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, const char *filename);
> +
>  int index_bulk_checkin(struct object_id *oid,
>  		       int fd, size_t size, enum object_type type,
>  		       const char *path, unsigned flags);
>
>  void plug_bulk_checkin(void);
> -void unplug_bulk_checkin(void);
> +void unplug_bulk_checkin(struct lock_file *);
>
>  #endif
> diff --git a/config.c b/config.c
> index f33abeab851..375bdb24b0a 100644
> --- a/config.c
> +++ b/config.c
> @@ -1509,7 +1509,9 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>  	}
>
>  	if (!strcmp(var, "core.fsyncobjectfiles")) {
> -		fsync_object_files = git_config_bool(var, value);
> +		int is_bool;
> +
> +		fsync_object_files = git_config_bool_or_int(var, value, &is_bool);
>  		return 0;
>  	}
>
> diff --git a/config.mak.uname b/config.mak.uname
> index 69413fb3dc0..8c07f2265a8 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
>  	HAVE_CLOCK_MONOTONIC = YesPlease
>  	# -lrt is needed for clock_gettime on glibc <= 2.16
>  	NEEDS_LIBRT = YesPlease
> +	HAVE_SYNC_FILE_RANGE = YesPlease
>  	HAVE_GETDELIM = YesPlease
>  	SANE_TEXT_GREP=-a
>  	FREAD_READS_DIRECTORIES = UnfortunatelyYes
> @@ -133,6 +134,7 @@ ifeq ($(uname_S),Darwin)
>  	COMPAT_OBJS += compat/precompose_utf8.o
>  	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
>  	BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1
> +	BASIC_CFLAGS += -DFSYNC_DOESNT_FLUSH=1
>  	HAVE_BSD_SYSCTL = YesPlease
>  	FREAD_READS_DIRECTORIES = UnfortunatelyYes
>  	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
> diff --git a/configure.ac b/configure.ac
> index 031e8d3fee8..c711037d625 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
>  	[AC_MSG_RESULT([no])
>  	HAVE_CLOCK_MONOTONIC=])
>  GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
> +
> +#
> +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
> +GIT_CHECK_FUNC(sync_file_range,
> +	[HAVE_SYNC_FILE_RANGE=YesPlease],
> +	[HAVE_SYNC_FILE_RANGE])
> +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
> +
>  #
>  # Define NO_SETITIMER if you don't have setitimer.
>  GIT_CHECK_FUNC(setitimer,
> diff --git a/git-compat-util.h b/git-compat-util.h
> index b46605300ab..d14e2436276 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
>  void BUG(const char *fmt, ...);
>  #endif
>
> +enum fsync_action {
> +    FSYNC_WRITEOUT_ONLY,
> +    FSYNC_HARDWARE_FLUSH
> +};
> +
> +int git_fsync(int fd, enum fsync_action action);
> +
>  /*
>   * Preserves errno, prints a message, but gives no warning for ENOENT.
>   * Returns 0 on success, which includes trying to unlink an object that does
> diff --git a/object-file.c b/object-file.c
> index 607e9e2f80b..5f04143dde0 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1859,16 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>  	return 0;
>  }
>
> -/* Finalize a file on disk, and close it. */
> -static int close_loose_object(int fd, const char *tmpfile, const char *filename)
> -{
> -	if (fsync_object_files)
> -		fsync_or_die(fd, "loose object file");
> -	if (close(fd) != 0)
> -		die_errno(_("error when closing loose object file"));
> -	return finalize_object_file(tmpfile, filename);
> -}
> -
>  /* Size of directory component, including the ending '/' */
>  static inline int directory_size(const char *filename)
>  {
> @@ -1982,7 +1972,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
>  	}
>
> -	return close_loose_object(fd, tmp_file.buf, filename.buf);
> +	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, filename.buf);
>  }
>
>  static int freshen_loose_object(const struct object_id *oid)
> diff --git a/wrapper.c b/wrapper.c
> index 563ad590df1..37a8b61a7df 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -538,6 +538,42 @@ int xmkstemp_mode(char *filename_template, int mode)
>  	return fd;
>  }
>
> +int git_fsync(int fd, enum fsync_action action)
> +{
> +	if (action == FSYNC_WRITEOUT_ONLY) {
> +#ifdef __APPLE__
> +		/*
> +		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
> +		 * flush hardware caches.
> +		 */
> +		return fsync(fd);
> +#endif
> +
> +#ifdef HAVE_SYNC_FILE_RANGE
> +		/*
> +		 * On linux 2.6.17 and above, sync_file_range is the way to issue
> +		 * a writeback without a hardware flush. An offset of 0 and size of 0
> +		 * indicates writeout of the entire file and the wait flags ensure that all
> +		 * dirty data is written to the disk (potentially in a disk-side cache)
> +		 * before we continue.
> +		 */
> +
> +		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
> +						 SYNC_FILE_RANGE_WRITE |
> +						 SYNC_FILE_RANGE_WAIT_AFTER);
> +#endif
> +
> +		errno = ENOSYS;
> +		return -1;
> +	}

Hmm. I wonder whether we can do this more consistently with how Git
usually does platform-specific things.

In the 3rd patch, the one where you implemented Windows-specific support,
in the Git for Windows PR at
https://github.com/git-for-windows/git/pull/3391, you introduce a
`mingw_fsync_no_flush()` function and define `fsync_no_flush` to expand to
that function name.

This is very similar to how Git does things. Take for example the
`offset_1st_component` macro:
https://github.com/git/git/blob/v2.33.0/git-compat-util.h#L386-L392

Unless defined in a platform-specific manner, it is defined in
`git-compat-util.h`:

	#ifndef offset_1st_component
	static inline int git_offset_1st_component(const char *path)
	{
		return is_dir_sep(path[0]);
	}
	#define offset_1st_component git_offset_1st_component
	#endif

And on Windows, it is defined as following
(https://github.com/git/git/blob/v2.33.0/compat/win32/path-utils.h#L34-L35),
before the lines quoted above:

	int win32_offset_1st_component(const char *path);
	#define offset_1st_component win32_offset_1st_component

We could do the exact same thing here. Define a platform-specific
`mingw_fsync_no_flush()` in `compat/mingw.h` and define the macro
`fsync_no_flush` to point to it. In `git-compat-util.h`, in the
`__APPLE__`-specific part, implement it via `fsync()`. And later, in the
platform-independent part, _iff_ the macro has not yet been defined,
implement an inline function that does that `HAVE_SYNC_FILE_RANGE` dance
and falls back to `ENOSYS`.

That would contain the platform-specific `#ifdef` blocks to
`git-compat-util.h`, which is exactly where we want them.

> +
> +#ifdef __APPLE__
> +	return fcntl(fd, F_FULLFSYNC);
> +#else
> +	return fsync(fd);
> +#endif

Same thing here. We would probably want something like `fsync_with_flush`
here.

It is my hope that you find my comments and suggestions helpful.

Thank you,
Johannes

> +}
> +
>  static int warn_if_unremovable(const char *op, const char *file, int rc)
>  {
>  	int err;
> diff --git a/write-or-die.c b/write-or-die.c
> index d33e68f6abb..8f53953d4ab 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>
>  void fsync_or_die(int fd, const char *msg)
>  {
> -	while (fsync(fd) < 0) {
> +	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
>  		if (errno != EINTR)
>  			die_errno("fsync error on '%s'", msg);
>  	}
> --
> gitgitgadget
>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25 18:52   ` Johannes Schindelin
@ 2021-08-25 21:26     ` Junio C Hamano
  2021-08-26  1:19     ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-08-25 21:26 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Neeraj Singh via GitGitGadget, git, Neeraj-Personal, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> It makes sense, but I would recommend using a more easily explained value
> than `2`. Maybe `delayed`? Or `bulk` or `batched`?

While we have less than 100% confidence in the implementation, it
may make sense to have such a knob to choose between "do we fsync
the old, known-safe but slow way, or do we fsync in batch"
behaviours, and I agree that the knob should not be called cryptic
"2".

But in a distant future when this new way of flushing proves to be
stable, it would make sense if the enw behaviour were triggered by
the plain vanilla 'true', no?  In a sense, running fsync in a batch
(or using syncfs) is an implementation detail of "we sync after
writing out object files and before declaring success".

Thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 1/2] object-file: use futimes rather than utime
  2021-08-25 13:51   ` Johannes Schindelin
@ 2021-08-25 22:08     ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-08-25 22:08 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Neeraj Singh via GitGitGadget, Git List, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Wed, Aug 25, 2021 at 6:51 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> If I were you, I would spell out "file descriptor" here.
Will do.

> > diff --git a/compat/mingw.c b/compat/mingw.c
> > index 9e0cd1e097f..948f4c3428b 100644
> > --- a/compat/mingw.c
> > +++ b/compat/mingw.c
> > @@ -949,19 +949,40 @@ int mingw_fstat(int fd, struct stat *buf)
> >       }
> >  }
> >
> > -static inline void time_t_to_filetime(time_t t, FILETIME *ft)
> > +static inline void timeval_to_filetime(const struct timeval *t, FILETIME *ft)
> >  {
> > -     long long winTime = t * 10000000LL + 116444736000000000LL;
> > +     long long winTime = t->tv_sec * 10000000LL + t->tv_usec * 10 + 116444736000000000LL;
>
> Technically, this is a change in behavior, right? We did not use to use
> nanosecond precision. But I don't think that we actually make use of this
> in this patch.
>
> >       ft->dwLowDateTime = winTime;
> >       ft->dwHighDateTime = winTime >> 32;
> >  }
> >
> > -int mingw_utime (const char *file_name, const struct utimbuf *times)
> > +int mingw_futimes(int fd, const struct timeval times[2])
>
> At first, I wondered whether it would make sense to pass the access time
> and the modified time separately, as pointers. I don't think that we pass
> around arrays as function parameters in Git anywhere else.
>
> But then I realized that `futimes()` is available in this precise form on
> Linux and on the BSDs. Therefore, it is not up to us to decide the
> function's signature.
>
> However, now that I looked at the manual page, I noticed that this
> function is not part of any POSIX standard.
>
> Which makes me think that we will have to do a bit more than just define
> it on Windows: we will have to introduce a `Makefile` knob (just like you
> did with `HAVE_SYNC_FILE_RANGE` in patch 2/2) and set that specifically
> for Linux and the BSDs, and use `futimes()` only if it is available
> (otherwise fall back to `utime()`).
>
> Then, as a separate patch, we should introduce this Windows-specific shim
> and declare that it is available via `config.mak.uname`.

Thanks for taking another look at the man pages. I looked again too and saw
that futimens is part of POSIX.1-2008:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html.
If I switch to futimens and implement the Windows shim at the same
time, is that sufficient to
address your feedback? I'd rather not ifdef this one since the
codeflow is quite different
depending on the presence of the API.

> > +}
> > +
> > +int mingw_utime (const char *file_name, const struct utimbuf *times)
>
> Please lose the space between the function name and the opening
> parenthesis. I know, the preimage of this diff has it, but that was an
> oversight and definitely disagrees with our current coding style.
Will do.

>
> > +{
> >       int fh, rc;
> >       DWORD attrs;
> >       wchar_t wfilename[MAX_PATH];
> > +     struct timeval tvs[2];
> > +
> >       if (xutftowcs_path(wfilename, file_name) < 0)
> >               return -1;
> >
> > @@ -979,17 +1000,12 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
> >       }
> >
> >       if (times) {
> > -             time_t_to_filetime(times->modtime, &mft);
> > -             time_t_to_filetime(times->actime, &aft);
> > -     } else {
> > -             GetSystemTimeAsFileTime(&mft);
> > -             aft = mft;
> > +             memset(tvs, 0, sizeof(tvs));
> > +             tvs[0].tv_sec = times->actime;
> > +             tvs[1].tv_sec = times->modtime;
>
> It is too bad that we have to copy around those values just to convert
> them, but I cannot think of any better way, either. And it's not like
> we're in a hot loop: this code will be dominated by I/O anyways.

Yeah, the cost of this is approximately 3-4 cycles (load-to-use
latency), so no one will notice relative to the system call overhead.

> > diff --git a/object-file.c b/object-file.c
> > index a8be8994814..607e9e2f80b 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1860,12 +1860,13 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
> >  }
> >
> >  /* Finalize a file on disk, and close it. */
> > -static void close_loose_object(int fd)
> > +static int close_loose_object(int fd, const char *tmpfile, const char *filename)
> >  {
> >       if (fsync_object_files)
> >               fsync_or_die(fd, "loose object file");
> >       if (close(fd) != 0)
> >               die_errno(_("error when closing loose object file"));
> > +     return finalize_object_file(tmpfile, filename);
>
> While this is a clear change of behavior, this function has only one
> caller, and that caller is adjusted accordingly.
>
> Could you add this clarification of context to the commit message? I know
> it will help me in the future, when I have to get up to speed again by
> reading the commit history.

How does the following revised wording sound?
```
    object-file: use futimens rather than utime

    Make close_loose_object do all of the steps for syncing and correctly
    naming a new loose object so that it can be reimplemented in the
    upcoming bulk-fsync mode.

    Use futimens, which is available in POSIX.1-2008 to update the file
    timestamps. This should be slightly faster than utime, since we have
    a file descriptor already available. This change allows us to update
    the time before closing, renaming, and potentially fsyincing the file
    being refreshed. This code is currently only invoked by git-pack-objects
    via force_object_loose.

    Implement a futimens shim for the Windows port of Git.

    Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
```

Thanks for the detailed and quick feedback!
-Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
@ 2021-08-26  0:49     ` Neeraj Singh
  2021-08-26  5:50       ` Christoph Hellwig
  2021-08-26  5:57     ` Christoph Hellwig
  1 sibling, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-08-26  0:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Neeraj Singh

On Wed, Aug 25, 2021 at 9:31 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Wed, Aug 25 2021, Neeraj Singh via GitGitGadget wrote:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > When adding many objects to a repo with core.fsyncObjectFiles set to
> > true, the cost of fsync'ing each object file can become prohibitive.
> >
> > One major source of the cost of fsync is the implied flush of the
> > hardware writeback cache within the disk drive. Fortunately, Windows,
> > MacOS, and Linux each offer mechanisms to write data from the filesystem
> > page cache without initiating a hardware flush.
> >
> > This patch introduces a new 'core.fsyncObjectFiles = 2' option that
> > takes advantage of the bulk-checkin infrastructure to batch up hardware
> > flushes.
> >
> > When the new mode is enabled we do the following for new objects:
> >
> > 1. Create a tmp_obj_XXXX file and write the object data to it.
> > 2. Issue a pagecache writeback request and wait for it to complete.
> > 3. Record the tmp name and the final name in the bulk-checkin state for
> >    later name.
> >
> > At the end of the entire transaction we:
> > 1. Issue a fsync against the lock file to flush the hardware writeback
> >    cache, which should by now have processed the tmp file writes.
> > 2. Rename all of the temp files to their final names.
> > 3. When updating the index and/or refs, we will issue another fsync
> >    internal to that operation.
> >
> > On a filesystem with a singular journal that is updated during name
> > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
> > would expect the fsync to trigger a journal writeout so that this
> > sequence is enough to ensure that the user's data is durable by the time
> > the git command returns.
> >
> > This change also updates the MacOS code to trigger a real hardware flush
> > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
> > MacOS there was no guarantee of durability since a simple fsync(2) call
> > does not flush any hardware caches.
>
> Thanks for working on this, good to see fsck issues picked up after some
> on-list pause.
>
> > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> > index c04f62a54a1..3b672c2db67 100644
> > --- a/Documentation/config/core.txt
> > +++ b/Documentation/config/core.txt
> > @@ -548,12 +548,17 @@ core.whitespace::
> >    errors. The default tab width is 8. Allowed values are 1 to 63.
> >
> >  core.fsyncObjectFiles::
> > -     This boolean will enable 'fsync()' when writing object files.
> > -+
> > -This is a total waste of time and effort on a filesystem that orders
> > -data writes properly, but can be useful for filesystems that do not use
> > -journalling (traditional UNIX filesystems) or that only journal metadata
> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> > +     A boolean value or the number '2', indicating the level of durability
> > +     applied to object files.
> > ++
> > +This setting controls how much effort Git makes to ensure that data added to
> > +the object store are durable in the case of an unclean system shutdown. If
> > +'false', Git allows data to remain in file system caches according to operating
> > +system policy, whence they may be lost if the system loses power or crashes. A
> > +value of 'true' instructs Git to force objects to stable storage immediately
> > +when they are added to the object store. The number '2' is an experimental
> > +value that also preserves durability but tries to perform hardware flushes in a
> > +batch.
>
> Some feedback/thoughts:
>
> 0) Let's not expose "2" to users, but give it some friendly config name
> and just translate this to the enum internally.

Agreed. I'll follow suggestions from here and elsewhere to make this a
human-readable
string. Is "core.fsyncObjectFiles=batch" acceptable?

>
> 1) Your commit message says "When updating the index and/or refs[...]"
> but we're changing core.fsyncObjectFiles here, I assume that's
> summarizing existing behavior then
That's what I intended. I'll make the patch description more explicit
about that.
In general, this patch is only concerned with loose object files. It's
my assumption
that other parts of the system (like the refs db) need to perform their own data
consistency and must have their own durability.

I do think that long-term, the Git community should think about having
a general transaction
mechanism with redo logging to have a consistent method for achieving
durability.

>
> 2) You say "when adding many [loose] objects to a repo[...]", and the
> test-case is "git stash push", but for e.g. accepting pushes we have
> transfer.unpackLimit.
>
> It would be interesting to see if/how this impacts performance there,
> and also if not that should at least be called out in
> documentation. I.e. users might want to set this differently on servers
> v.s. checkouts.
>
> But also, is this sort of thing something we could mitigate even more in
> commands like "git stash push" by just writing a pack instead of N loose
> objects?
>
> I don't think such questions should preclude changing the fsync
> approach, or offering more options, but they should inform our
> longer-term goals.

Dealing only/mostly in packfiles would be a great approach. I'd hope that
this fsyncing work would mostly be superseded if such a change is rolled out.
I just read about the geometric repacking stuff, and it looks reminiscent of the
LSM-tree approach to databases.

>
> 3) Re some of the musings about fsync() recently in
> https://lore.kernel.org/git/877dhs20x3.fsf@evledraar.gmail.com/; is this
> method of doing not-quite-an-fsync guaranteed by some OS's / POSIX etc,
> or is it more like the initial approach before core.fsyncObjectFiles,
> i.e. the happy-go-lucky approach described in the "[...]that orders data
> writes properly[...]" documentation you're removing.

I am confident about the validity of the batched approach on Windows when
running on NTFS and ReFS, given my background as a Windows FS developer.
We are unlikely to change any mainstream data consistency behavior of
our filesystems
to be weaker, given the type of errors such changes would cause.

In Windows, we call the FS requirements to support this change
"external metadata consistency",
which states that all metadata operations that could have led to a
state later FSYNCed must be
visible after the fsync.

macOS's fsync documentation indicates that they are likely to
implement the required guarantees.
They specifically say that fsync triggers writeback or data and
metadata, but does not issue a hardware
flush.  Please see the doc at
https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/fsync.2.html.
It is also notable that Apple SSDs have particularly bad performance
for flush operations (we noticed this
when booting Windows through BootCamp as well).

Unfortunately my perusal of the man pages and documentation I could find doesn't
give me this level of confidence on typical Linux filesystems. For
instance, the notion of having to
fsync the parent directory in order to render an inode's link findable
eliminates a lot of the
advantage of this change, though we could batch those and would have
to do at most 256.

This thread is somewhat instructive, but inconclusive:
https://lwn.net/ml/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/.
One conclusion from reviewing that thread is that as of then,
sync_file_ranges isn't actually enough
to make a hard guarantee about writeout occurring. See
https://lore.kernel.org/linux-fsdevel/20190319204330.GY26298@dastard/.
My hope is that the Linux FS developers have rectified that shortcoming by now.

>
> 4) While that documentation written by Linus long ago is rather
> flippant, I think just removing it and not replacing it with some
> discussion about how this is a trade-off v.s. real-world filesystem
> semantics isn't a good change.

I don't think the replaced documentation applies anymore to an ext4 or xfs
system with delayed allocation. Those filesystems effectively have
data=writeback
semantics because they don't need data ordering to avoid exposing unwritten
data, and so don't write the data at any particular syscall boundary
or with any particular
ordering.

I think my updated version of the documentation for "= false" is
accurate and more helpful
from a user perspective ("up to OS policy when your data becomes durable in
the event of an unclean shutdown").  "= true" also has a reasonable
description, though I
might add some verbiage indicating that this setting could be costly.

I'll take a crack at improving the batched mode documentation.

>
> 5) On a similar thought as transfer.unpackLimit in #2, I wonder if this
> fsync() setting shouldn't really be something we should be splitting
> up. I.e. maybe handle batch loose object writes one way, ref updates
> another way etc. I think moving core.fsync* to a setting like what we
> have for fsck.* and <cmd>.fsck.* is probably a better thing to do in the
> longer term.
>
> I.e. being able to do things like:
>
>     fsync.objectFiles = none
>     fsync.refFiles = cache # or "hardware"
>     receive.fsync.objectFiles = hardware
>     receive.fsync.refFiles = hardware
>
> Or whatever, i.e. we're using one hammer for all of these now, but I
> suspect most users who care about fsync care about /some/ fsync, not
> everything.
>

I disagree. I believe Git should offer a consolidated config setting
with two overall goals:

1) A consistent high-integrity setting across the entire git
index/object/ref state, primarily
for people using a repo for active development of changes. This should
roughly guarantee
that when a git command that adds data to the repo completes, the data
is durable within git,
including the refs needed to find it.

2) A lower-integrity setting useful for build/CI, maintainers who are
applying lots of patches, etc,
where it is expected that the data is available elsewhere and can be
recovered easily.

> 6) Inline comments below.
>
> > +struct object_rename {
> > +     char *src;
> > +     char *dst;
> > +};
> > +
> > +static struct bulk_rename_state {
> > +     struct object_rename *renames;
> > +     uint32_t alloc_renames;
> > +     uint32_t nr_renames;
> > +} bulk_rename_state;
>
> In a crash of git itself it seems we're going to leave some litter
> behind in the object dir now, and "git gc" won't know how to clean it
> up. I think this is going to want to just use the tmp-objdir.[ch] API,
> which might or might not need to be extended for loose objects / some
> oddities of this use-case.
>

It appears that "git prune" would take care of these files.

> Also, if you have a pair of things like this the string-list API is much
> more pleasing to use than coming up with your own encapsulation.

Thanks, I'll update to use the string-list API.

> >  static struct bulk_checkin_state {
> >       unsigned plugged:1;
> >
> > @@ -21,13 +33,15 @@ static struct bulk_checkin_state {
> >       struct pack_idx_entry **written;
> >       uint32_t alloc_written;
> >       uint32_t nr_written;
> > -} state;
>
>
> > +
> > +             free(state->renames);
> > +             memset(state, 0, sizeof(*state));
>
> So with this and other use of the "state" variable is this part of
> bulk-checkin going to become thread-unsafe, was that already the case?

Yes, this code was already thread-unsafe if we're in the "bulk checkin
plugged" mode and
that hasn't changed. Is it worth fixing this right now? Is there a
preexisting example of code
that uses thread-local-storage inside a library function and then
merges the state later? Bonus
points for only doing the thread-local stuff if alternate threads are
actually active.

> > +static void add_rename_bulk_checkin(struct bulk_rename_state *state,
> > +                                 const char *src, const char *dst)
> > +{
> > +     struct object_rename *rename;
> > +
> > +     ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
> > +
> > +     rename = &state->renames[state->nr_renames++];
> > +     rename->src = xstrdup(src);
> > +     rename->dst = xstrdup(dst);
> > +}
>
> All boilerplate duplicating things you'd get with a string-list for free...

Will fix.Thanks.

>
> > +             /*
> > +              * If we have a plugged bulk checkin, we issue a call that
> > +              * cleans the filesystem page cache but avoids a hardware flush
> > +              * command. Later on we will issue a single hardware flush
> > +              * before renaming files as part of do_sync_and_rename.
> > +              */
>
> So this is the sort of thing I meant by extending Linus's docs, I know
> some FS's work this way, but not all do.
>
> Also there's no guarantee in git that your .git is on one FS, so I think
> even for the FS's you have in mind this might not be an absolute
> guarantee...

This is unfortunately an issue that isn't resolvable within Git. I think there's
value in supporting batched mode for the common systems and filesystems
that do support the guarantee.  I'd like to be able to set batched mode as the
default on Windows eventually.  It also looks like it can be default on macOS.

I think linux might be able to get to the desired semantics for default distro
filesystems with minimal changes. Perhaps this new mode in Git would provide
them with a motivation to do so.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25 18:52   ` Johannes Schindelin
  2021-08-25 21:26     ` Junio C Hamano
@ 2021-08-26  1:19     ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-08-26  1:19 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Neeraj Singh via GitGitGadget, Git List, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Wed, Aug 25, 2021 at 11:52 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Wed, 25 Aug 2021, Neeraj Singh via GitGitGadget wrote:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > When adding many objects to a repo with core.fsyncObjectFiles set to
> > true, the cost of fsync'ing each object file can become prohibitive.
> >
> > One major source of the cost of fsync is the implied flush of the
> > hardware writeback cache within the disk drive. Fortunately, Windows,
> > MacOS, and Linux each offer mechanisms to write data from the filesystem
> > page cache without initiating a hardware flush.
> >
> > This patch introduces a new 'core.fsyncObjectFiles = 2' option that
> > takes advantage of the bulk-checkin infrastructure to batch up hardware
> > flushes.
>
> It makes sense, but I would recommend using a more easily explained value
> than `2`. Maybe `delayed`? Or `bulk` or `batched`?
>
> The way this would be implemented would look somewhat like the
> implementation for `core.abbrev`, which also accepts a string ("auto") or
> a Boolean (or even an integral number), see
> https://github.com/git/git/blob/v2.33.0/config.c#L1367-L1381:
>
>         if (!strcmp(var, "core.abbrev")) {
>                 if (!value)
>                         return config_error_nonbool(var);
>                 if (!strcasecmp(value, "auto"))
>                         default_abbrev = -1;
>                 else if (!git_parse_maybe_bool_text(value))
>                         default_abbrev = the_hash_algo->hexsz;
>                 else {
>                         int abbrev = git_config_int(var, value);
>                         if (abbrev < minimum_abbrev || abbrev > the_hash_algo->hexsz)
>                                 return error(_("abbrev length out of range: %d"), abbrev);
>                         default_abbrev = abbrev;
>                 }
>                 return 0;
>         }
>

Thanks for the code example. I'll follow something similar. I'll prefer the name
"batch" for the new fsync mode.

> > When the new mode is enabled we do the following for new objects:
> >
> > 1. Create a tmp_obj_XXXX file and write the object data to it.
> > 2. Issue a pagecache writeback request and wait for it to complete.
> > 3. Record the tmp name and the final name in the bulk-checkin state for
> >    later name.
> >
> > At the end of the entire transaction we:
> > 1. Issue a fsync against the lock file to flush the hardware writeback
> >    cache, which should by now have processed the tmp file writes.
> > 2. Rename all of the temp files to their final names.
> > 3. When updating the index and/or refs, we will issue another fsync
> >    internal to that operation.
> >
> > On a filesystem with a singular journal that is updated during name
> > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
> > would expect the fsync to trigger a journal writeout so that this
> > sequence is enough to ensure that the user's data is durable by the time
> > the git command returns.
> >
> > This change also updates the MacOS code to trigger a real hardware flush
> > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
> > MacOS there was no guarantee of durability since a simple fsync(2) call
> > does not flush any hardware caches.
>
> You included a very nice table with performance numbers in the cover
> letter. Maybe include that here, in the commit message?

Will do.

>
> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> > ---
> >  Documentation/config/core.txt |  17 ++++--
> >  Makefile                      |   4 ++
> >  builtin/add.c                 |   3 +-
> >  builtin/update-index.c        |   3 +
> >  bulk-checkin.c                | 105 +++++++++++++++++++++++++++++++---
> >  bulk-checkin.h                |   4 +-
> >  config.c                      |   4 +-
> >  config.mak.uname              |   2 +
> >  configure.ac                  |   8 +++
> >  git-compat-util.h             |   7 +++
> >  object-file.c                 |  12 +---
> >  wrapper.c                     |  36 ++++++++++++
> >  write-or-die.c                |   2 +-
> >  13 files changed, 177 insertions(+), 30 deletions(-)
> >
> > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> > index c04f62a54a1..3b672c2db67 100644
> > --- a/Documentation/config/core.txt
> > +++ b/Documentation/config/core.txt
> > @@ -548,12 +548,17 @@ core.whitespace::
> >    errors. The default tab width is 8. Allowed values are 1 to 63.
> >
> >  core.fsyncObjectFiles::
> > -     This boolean will enable 'fsync()' when writing object files.
> > -+
> > -This is a total waste of time and effort on a filesystem that orders
> > -data writes properly, but can be useful for filesystems that do not use
> > -journalling (traditional UNIX filesystems) or that only journal metadata
> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> > +     A boolean value or the number '2', indicating the level of durability
> > +     applied to object files.
> > ++
> > +This setting controls how much effort Git makes to ensure that data added to
> > +the object store are durable in the case of an unclean system shutdown. If
>
> In addition to the content, I also like a lot that this tempers down the
> language to be a lot more agreeable to read.
>
> > +'false', Git allows data to remain in file system caches according to operating
> > +system policy, whence they may be lost if the system loses power or crashes. A
> > +value of 'true' instructs Git to force objects to stable storage immediately
> > +when they are added to the object store. The number '2' is an experimental
> > +value that also preserves durability but tries to perform hardware flushes in a
> > +batch.

I'll be revising this text a little bit to respond to avarab's
feedback.  I'm guessing
that this will need a few more rounds of tweaking.

> >
> >  core.preloadIndex::
> >       Enable parallel index preload for operations like 'git diff'
> > diff --git a/Makefile b/Makefile
> > index 9573190f1d7..cb950ee43d3 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1896,6 +1896,10 @@ ifdef HAVE_CLOCK_MONOTONIC
> >       BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
> >  endif
> >
> > +ifdef HAVE_SYNC_FILE_RANGE
> > +     BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
> > +endif
> > +
> >  ifdef NEEDS_LIBRT
> >       EXTLIBS += -lrt
> >  endif
> > diff --git a/builtin/add.c b/builtin/add.c
> > index 09e684585d9..c58dfcd4bc3 100644
> > --- a/builtin/add.c
> > +++ b/builtin/add.c
> > @@ -670,7 +670,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
> >
> >       if (chmod_arg && pathspec.nr)
> >               exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
> > -     unplug_bulk_checkin();
> > +
> > +     unplug_bulk_checkin(&lock_file);
> >
> >  finish:
> >       if (write_locked_index(&the_index, &lock_file,
> > diff --git a/builtin/update-index.c b/builtin/update-index.c
> > index f1f16f2de52..64d025cf49e 100644
> > --- a/builtin/update-index.c
> > +++ b/builtin/update-index.c
> > @@ -5,6 +5,7 @@
> >   */
> >  #define USE_THE_INDEX_COMPATIBILITY_MACROS
> >  #include "cache.h"
> > +#include "bulk-checkin.h"
> >  #include "config.h"
> >  #include "lockfile.h"
> >  #include "quote.h"
> > @@ -1152,6 +1153,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
> >               struct strbuf unquoted = STRBUF_INIT;
> >
> >               setup_work_tree();
> > +             plug_bulk_checkin();
> >               while (getline_fn(&buf, stdin) != EOF) {
> >                       char *p;
> >                       if (!nul_term_line && buf.buf[0] == '"') {
> > @@ -1166,6 +1168,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
> >                               chmod_path(set_executable_bit, p);
> >                       free(p);
> >               }
> > +             unplug_bulk_checkin(&lock_file);
> >               strbuf_release(&unquoted);
> >               strbuf_release(&buf);
> >       }
>
> This change to `cmd_update_index()`, would it make sense to separate it
> out into its own commit? I think it would, as it is a slight change of
> behavior of the `--stdin` mode, no?

Will do.

This makes me think that there is some risk here if someone launches an
update-index and then attempts to use the index to find a newly-added
OID without
completing the update-index invocation.  It's possible to construct
such a scenario
if someone's using the "--verbose" scenario to find out about actions
taken by the
subprocess.  Do you think I should add a flag to conditionally enable
bulk-checkin to
avoid regression for this (I expect unlikely) case?


> > diff --git a/bulk-checkin.c b/bulk-checkin.c
> > index b023d9959aa..71004db863e 100644
> > --- a/bulk-checkin.c
> > +++ b/bulk-checkin.c
> > @@ -3,6 +3,7 @@
> >   */
> >  #include "cache.h"
> >  #include "bulk-checkin.h"
> > +#include "lockfile.h"
> >  #include "repository.h"
> >  #include "csum-file.h"
> >  #include "pack.h"
> > @@ -10,6 +11,17 @@
> >  #include "packfile.h"
> >  #include "object-store.h"
> >
> > +struct object_rename {
> > +     char *src;
> > +     char *dst;
> > +};
> > +
> > +static struct bulk_rename_state {
> > +     struct object_rename *renames;
> > +     uint32_t alloc_renames;
> > +     uint32_t nr_renames;
> > +} bulk_rename_state;
> > +
> >  static struct bulk_checkin_state {
> >       unsigned plugged:1;
> >
> > @@ -21,13 +33,15 @@ static struct bulk_checkin_state {
> >       struct pack_idx_entry **written;
> >       uint32_t alloc_written;
> >       uint32_t nr_written;
> > -} state;
> > +
> > +} bulk_checkin_state;
>
> While it definitely looks better after this patch, having the new code
> _and_ the rename in the same set of changes makes it a bit harder to
> review and to spot bugs.
>
> Could I ask you to split this rename out into its own, preparatory patch
> ("preparatory" meaning that it should be ordered before the patch that
> adds support for the new fsync mode)?

Will do.  I'm going to change a few things as I'll mention below.

>
> >
> >  static void finish_bulk_checkin(struct bulk_checkin_state *state)
> >  {
> >       struct object_id oid;
> >       struct strbuf packname = STRBUF_INIT;
> >       int i;
> > +     unsigned old_plugged;
>
> Since this variable is designed to hold the value of the `plugged` field
> of the `bulk_checkin_state`, which is declared as `unsigned plugged:1;`,
> we probably want a `:1` here, too.
>
> Also: is it really "old", rather than "orig"? I would have expected the
> name `orig_plugged` or `save_plugged`.
>
> >
> >       if (!state->f)
> >               return;
> > @@ -55,13 +69,42 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
> >
> >  clear_exit:
> >       free(state->written);
> > +     old_plugged = state->plugged;
> >       memset(state, 0, sizeof(*state));
> > +     state->plugged = old_plugged;
>
> Unfortunately, I lack the context to understand the purpose of this. Is
> the idea that `plugged` gives an indication whether we're still within
> that batch that should be fsync'ed all at once?
>
> I only see one caller where this would make a difference, and that caller
> is `deflate_to_pack()`. Maybe we should just start that function with
> `unsigned save_plugged:1 = state->plugged;` and restore it after the
> `while (1)` loop?
>

The problem being solved here is that in some (rare) circumstances it looks like
we could unplug the state before an explicit unplug call.  That would
be fatal for the
bulk-fsync code, and is probably suboptimal for the bulk checkin code.
I'm going to separate
this out the boolean to a different variable in my preparatory patch
so that this fragility goes away.

> >
> >       strbuf_release(&packname);
> >       /* Make objects we just wrote available to ourselves */
> >       reprepare_packed_git(the_repository);
> >  }
> >
> > +static void do_sync_and_rename(struct bulk_rename_state *state, struct lock_file *lock_file)
> > +{
> > +     if (state->nr_renames) {
> > +             int i;
> > +
> > +             /*
> > +              * Issue a full hardware flush against the lock file to ensure
> > +              * that all objects are durable before any renames occur.
> > +              * The code in fsync_and_close_loose_object_bulk_checkin has
> > +              * already ensured that writeout has occurred, but it has not
> > +              * flushed any writeback cache in the storage hardware.
> > +              */
> > +             fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
> > +
> > +             for (i = 0; i < state->nr_renames; i++) {
> > +                     if (finalize_object_file(state->renames[i].src, state->renames[i].dst))
> > +                             die_errno(_("could not rename '%s'"), state->renames[i].src);
> > +
> > +                     free(state->renames[i].src);
> > +                     free(state->renames[i].dst);
> > +             }
> > +
> > +             free(state->renames);
> > +             memset(state, 0, sizeof(*state));
>
> Hmm. There is a lot of `memset()`ing going on, and I am not quite sure
> that I like what I am seeing. It does not help that there are now two very
> easily-confused structs: `bulk_rename_state` and `bulk_checkin_state`.
> Which made me worried at first that we might be resetting the `renames`
> field inadvertently in `finish_bulk_checkin()`.

Yeah, the problem I was trying to avoid was an issue with resetting the rename
state during the "too big pack" case.  What do you think about a
"bulk_pack_state" and
a "bulk_fsync_state" variable?  One side advantage of the
"s/state/bulk_rename_state" change
is that the variable name is no longer ambiguous in the debugger
across different object files.

>
> Maybe we can do this instead?
>
>                 FREE_AND_NULL(state->renames);
>                 state->nr_renames = state->alloc_renames = 0;
>

I wrote it that way originally, but saw an error with coccinelle and
then decided to follow
the convention from elsewhere in the file.  That's when I noticed the
nasty "early unplug" case,
so the Github actions certainly saved me from an unfortunate bug.
Thanks for setting them up!

I'll go toward your suggestion.

> > +     }
> > +}
> > +
> >  static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
> >  {
> >       int i;
> > @@ -256,25 +299,69 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
> >       return 0;
> >  }
> >
> > +static void add_rename_bulk_checkin(struct bulk_rename_state *state,
> > +                                 const char *src, const char *dst)
> > +{
> > +     struct object_rename *rename;
> > +
> > +     ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
> > +
> > +     rename = &state->renames[state->nr_renames++];
> > +     rename->src = xstrdup(src);
> > +     rename->dst = xstrdup(dst);
> > +}
> > +
> > +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
> > +                                           const char *filename)
> > +{
> > +     if (fsync_object_files) {
> > +             /*
> > +              * If we have a plugged bulk checkin, we issue a call that
> > +              * cleans the filesystem page cache but avoids a hardware flush
> > +              * command. Later on we will issue a single hardware flush
> > +              * before renaming files as part of do_sync_and_rename.
> > +              */
> > +             if (bulk_checkin_state.plugged &&
> > +                 fsync_object_files == 2 &&
> > +                 git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
> > +                     add_rename_bulk_checkin(&bulk_rename_state, tmpfile, filename);
> > +                     if (close(fd))
> > +                             die_errno(_("error when closing loose object file"));
> > +
> > +                     return 0;
> > +
> > +             } else {
> > +                     fsync_or_die(fd, "loose object file");
> > +             }
> > +     }
> > +
> > +     if (close(fd))
> > +             die_errno(_("error when closing loose object file"));
> > +
> > +     return finalize_object_file(tmpfile, filename);
> > +}
> > +
> >  int index_bulk_checkin(struct object_id *oid,
> >                      int fd, size_t size, enum object_type type,
> >                      const char *path, unsigned flags)
> >  {
> > -     int status = deflate_to_pack(&state, oid, fd, size, type,
> > +     int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
> >                                    path, flags);
> > -     if (!state.plugged)
> > -             finish_bulk_checkin(&state);
> > +     if (!bulk_checkin_state.plugged)
> > +             finish_bulk_checkin(&bulk_checkin_state);
> >       return status;
> >  }
> >
> >  void plug_bulk_checkin(void)
> >  {
> > -     state.plugged = 1;
> > +     bulk_checkin_state.plugged = 1;
> >  }
> >
> > -void unplug_bulk_checkin(void)
> > +void unplug_bulk_checkin(struct lock_file *lock_file)
> >  {
> > -     state.plugged = 0;
> > -     if (state.f)
> > -             finish_bulk_checkin(&state);
> > +     bulk_checkin_state.plugged = 0;
> > +     if (bulk_checkin_state.f)
> > +             finish_bulk_checkin(&bulk_checkin_state);
> > +
> > +     do_sync_and_rename(&bulk_rename_state, lock_file);
> >  }
> > diff --git a/bulk-checkin.h b/bulk-checkin.h
> > index b26f3dc3b74..8efb01ed669 100644
> > --- a/bulk-checkin.h
> > +++ b/bulk-checkin.h
> > @@ -6,11 +6,13 @@
> >
> >  #include "cache.h"
> >
> > +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, const char *filename);
> > +
> >  int index_bulk_checkin(struct object_id *oid,
> >                      int fd, size_t size, enum object_type type,
> >                      const char *path, unsigned flags);
> >
> >  void plug_bulk_checkin(void);
> > -void unplug_bulk_checkin(void);
> > +void unplug_bulk_checkin(struct lock_file *);
> >
> >  #endif
> > diff --git a/config.c b/config.c
> > index f33abeab851..375bdb24b0a 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1509,7 +1509,9 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
> >       }
> >
> >       if (!strcmp(var, "core.fsyncobjectfiles")) {
> > -             fsync_object_files = git_config_bool(var, value);
> > +             int is_bool;
> > +
> > +             fsync_object_files = git_config_bool_or_int(var, value, &is_bool);
> >               return 0;
> >       }
> >
> > diff --git a/config.mak.uname b/config.mak.uname
> > index 69413fb3dc0..8c07f2265a8 100644
> > --- a/config.mak.uname
> > +++ b/config.mak.uname
> > @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
> >       HAVE_CLOCK_MONOTONIC = YesPlease
> >       # -lrt is needed for clock_gettime on glibc <= 2.16
> >       NEEDS_LIBRT = YesPlease
> > +     HAVE_SYNC_FILE_RANGE = YesPlease
> >       HAVE_GETDELIM = YesPlease
> >       SANE_TEXT_GREP=-a
> >       FREAD_READS_DIRECTORIES = UnfortunatelyYes
> > @@ -133,6 +134,7 @@ ifeq ($(uname_S),Darwin)
> >       COMPAT_OBJS += compat/precompose_utf8.o
> >       BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
> >       BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1
> > +     BASIC_CFLAGS += -DFSYNC_DOESNT_FLUSH=1
> >       HAVE_BSD_SYSCTL = YesPlease
> >       FREAD_READS_DIRECTORIES = UnfortunatelyYes
> >       HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
> > diff --git a/configure.ac b/configure.ac
> > index 031e8d3fee8..c711037d625 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
> >       [AC_MSG_RESULT([no])
> >       HAVE_CLOCK_MONOTONIC=])
> >  GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
> > +
> > +#
> > +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
> > +GIT_CHECK_FUNC(sync_file_range,
> > +     [HAVE_SYNC_FILE_RANGE=YesPlease],
> > +     [HAVE_SYNC_FILE_RANGE])
> > +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
> > +
> >  #
> >  # Define NO_SETITIMER if you don't have setitimer.
> >  GIT_CHECK_FUNC(setitimer,
> > diff --git a/git-compat-util.h b/git-compat-util.h
> > index b46605300ab..d14e2436276 100644
> > --- a/git-compat-util.h
> > +++ b/git-compat-util.h
> > @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
> >  void BUG(const char *fmt, ...);
> >  #endif
> >
> > +enum fsync_action {
> > +    FSYNC_WRITEOUT_ONLY,
> > +    FSYNC_HARDWARE_FLUSH
> > +};
> > +
> > +int git_fsync(int fd, enum fsync_action action);
> > +
> >  /*
> >   * Preserves errno, prints a message, but gives no warning for ENOENT.
> >   * Returns 0 on success, which includes trying to unlink an object that does
> > diff --git a/object-file.c b/object-file.c
> > index 607e9e2f80b..5f04143dde0 100644
> > --- a/object-file.c
> > +++ b/object-file.c
> > @@ -1859,16 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
> >       return 0;
> >  }
> >
> > -/* Finalize a file on disk, and close it. */
> > -static int close_loose_object(int fd, const char *tmpfile, const char *filename)
> > -{
> > -     if (fsync_object_files)
> > -             fsync_or_die(fd, "loose object file");
> > -     if (close(fd) != 0)
> > -             die_errno(_("error when closing loose object file"));
> > -     return finalize_object_file(tmpfile, filename);
> > -}
> > -
> >  /* Size of directory component, including the ending '/' */
> >  static inline int directory_size(const char *filename)
> >  {
> > @@ -1982,7 +1972,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
> >                       warning_errno(_("failed futimes() on %s"), tmp_file.buf);
> >       }
> >
> > -     return close_loose_object(fd, tmp_file.buf, filename.buf);
> > +     return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, filename.buf);
> >  }
> >
> >  static int freshen_loose_object(const struct object_id *oid)
> > diff --git a/wrapper.c b/wrapper.c
> > index 563ad590df1..37a8b61a7df 100644
> > --- a/wrapper.c
> > +++ b/wrapper.c
> > @@ -538,6 +538,42 @@ int xmkstemp_mode(char *filename_template, int mode)
> >       return fd;
> >  }
> >
> > +int git_fsync(int fd, enum fsync_action action)
> > +{
> > +     if (action == FSYNC_WRITEOUT_ONLY) {
> > +#ifdef __APPLE__
> > +             /*
> > +              * on Mac OS X, fsync just causes filesystem cache writeback but does not
> > +              * flush hardware caches.
> > +              */
> > +             return fsync(fd);
> > +#endif
> > +
> > +#ifdef HAVE_SYNC_FILE_RANGE
> > +             /*
> > +              * On linux 2.6.17 and above, sync_file_range is the way to issue
> > +              * a writeback without a hardware flush. An offset of 0 and size of 0
> > +              * indicates writeout of the entire file and the wait flags ensure that all
> > +              * dirty data is written to the disk (potentially in a disk-side cache)
> > +              * before we continue.
> > +              */
> > +
> > +             return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
> > +                                              SYNC_FILE_RANGE_WRITE |
> > +                                              SYNC_FILE_RANGE_WAIT_AFTER);
> > +#endif
> > +
> > +             errno = ENOSYS;
> > +             return -1;
> > +     }
>
> Hmm. I wonder whether we can do this more consistently with how Git
> usually does platform-specific things.
>
> In the 3rd patch, the one where you implemented Windows-specific support,
> in the Git for Windows PR at
> https://github.com/git-for-windows/git/pull/3391, you introduce a
> `mingw_fsync_no_flush()` function and define `fsync_no_flush` to expand to
> that function name.
>
> This is very similar to how Git does things. Take for example the
> `offset_1st_component` macro:
> https://github.com/git/git/blob/v2.33.0/git-compat-util.h#L386-L392
>
> Unless defined in a platform-specific manner, it is defined in
> `git-compat-util.h`:
>
>         #ifndef offset_1st_component
>         static inline int git_offset_1st_component(const char *path)
>         {
>                 return is_dir_sep(path[0]);
>         }
>         #define offset_1st_component git_offset_1st_component
>         #endif
>
> And on Windows, it is defined as following
> (https://github.com/git/git/blob/v2.33.0/compat/win32/path-utils.h#L34-L35),
> before the lines quoted above:
>
>         int win32_offset_1st_component(const char *path);
>         #define offset_1st_component win32_offset_1st_component
>
> We could do the exact same thing here. Define a platform-specific
> `mingw_fsync_no_flush()` in `compat/mingw.h` and define the macro
> `fsync_no_flush` to point to it. In `git-compat-util.h`, in the
> `__APPLE__`-specific part, implement it via `fsync()`. And later, in the
> platform-independent part, _iff_ the macro has not yet been defined,
> implement an inline function that does that `HAVE_SYNC_FILE_RANGE` dance
> and falls back to `ENOSYS`.
>
> That would contain the platform-specific `#ifdef` blocks to
> `git-compat-util.h`, which is exactly where we want them.
>
> > +
> > +#ifdef __APPLE__
> > +     return fcntl(fd, F_FULLFSYNC);
> > +#else
> > +     return fsync(fd);
> > +#endif
>
> Same thing here. We would probably want something like `fsync_with_flush`
> here.

I thought about doing it that way originally.  But there's the
unfortunate fact that
I'd have to alias fsync_no_flush to fsync on macOS and then fsync to
fnctl(F_FULLFSYNC),
I felt that it would be clearer to someone reviewing this
functionality if we provide a
very explicit git_fsync API with a well-named flag and document the
OS-specific craziness
in the C file rather than through a layer of macros in the header file.

Given that, are you okay with keeping this code layout in the C file,
potentially with
more local modifications?

>
> It is my hope that you find my comments and suggestions helpful.
>
> Thank you,
> Johannes
>

Very much so! Again thanks for the review.

-Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-26  0:49     ` Neeraj Singh
@ 2021-08-26  5:50       ` Christoph Hellwig
  2021-08-28  0:20         ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Christoph Hellwig @ 2021-08-26  5:50 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Ævar Arnfjörð Bjarmason,
	Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Neeraj Singh

On Wed, Aug 25, 2021 at 05:49:45PM -0700, Neeraj Singh wrote:
> Unfortunately my perusal of the man pages and documentation I could find doesn't
> give me this level of confidence on typical Linux filesystems. For
> instance, the notion of having to
> fsync the parent directory in order to render an inode's link findable
> eliminates a lot of the
> advantage of this change, though we could batch those and would have
> to do at most 256.
> 
> This thread is somewhat instructive, but inconclusive:
> https://lwn.net/ml/linux-fsdevel/1552418820-18102-1-git-send-email-jaya@cs.utexas.edu/.

fsync/fdatasync only guarantees consistency for the file handle they
are called on.  The first linked document mentioned an implementation
artifact that file systems with metadata logging tend to force their
log out until the last modified transaction and thus force out metadata
changes done earlier.  This won't help with actual data writes at all,
as for them the fact of writing back data will often generate new
metadata changes., and in general is not a property to rely on if you
care about data integrity.  It is nice to optimize the order of the
fsync calls for metadata only workloads, as then often the later fsync
calls on earlier modified file handles will be no-ops.

> One conclusion from reviewing that thread is that as of then,
> sync_file_ranges isn't actually enough
> to make a hard guarantee about writeout occurring. See
> https://lore.kernel.org/linux-fsdevel/20190319204330.GY26298@dastard/.
> My hope is that the Linux FS developers have rectified that shortcoming by now.

I'm not sure what shortcoming you mean.  sync_file_ranges is a system
call that only causes data writeback.  It never performs metadata write
back and thus is not an integrity operation at all.  That is also very
clearly documented in the man page.

> I think my updated version of the documentation for "= false" is
> accurate and more helpful
> from a user perspective ("up to OS policy when your data becomes durable in
> the event of an unclean shutdown").  "= true" also has a reasonable
> description, though I
> might add some verbiage indicating that this setting could be costly.

Your version is much better.  In fact it almost still too nice as in
general it will not be durable and you do end up with a corrupted
repository in that case.  Note that even for bad old ext3 that was
usually the case.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25 17:40     ` Neeraj Singh
@ 2021-08-26  5:54       ` Christoph Hellwig
  0 siblings, 0 replies; 160+ messages in thread
From: Christoph Hellwig @ 2021-08-26  5:54 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Christoph Hellwig, Neeraj Singh via GitGitGadget, Git List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Wed, Aug 25, 2021 at 10:40:53AM -0700, Neeraj Singh wrote:
> I'd expect syncfs to suffer from the noisy-neighbor problem that Linus
> alluded to on the big
> thread you kicked off.

It does.  That being said I suspect in most developer workstation
use cases it will still be a win.  Maybe I'll look into implemeting
it after your series lands.

> If someone adds a more targeted bulk sync interface to the Linux
> kernel, I'm sure Git could be
> changed to use it. Maybe an fcntl(2) interface that initiates
> writeback and registers completion with an
> eventfd.

That is in general very hard to do with how the VM-level writeback
occurs.  In the file system itself it could work much better, e.g.
for XFS we write the log up to a specific sequence number and could
notify when doing that.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
  2021-08-26  0:49     ` Neeraj Singh
@ 2021-08-26  5:57     ` Christoph Hellwig
  1 sibling, 0 replies; 160+ messages in thread
From: Christoph Hellwig @ 2021-08-26  5:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, git, Neeraj-Personal,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Neeraj Singh

On Wed, Aug 25, 2021 at 06:11:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 3) Re some of the musings about fsync() recently in
> https://lore.kernel.org/git/877dhs20x3.fsf@evledraar.gmail.com/; is this
> method of doing not-quite-an-fsync guaranteed by some OS's / POSIX etc,
> or is it more like the initial approach before core.fsyncObjectFiles,
> i.e. the happy-go-lucky approach described in the "[...]that orders data
> writes properly[...]" documentation you're removing.

Except for the now removed ext3 filesystem in Linux that basically turned
every fsync into syncfs, that is a file system-wide sync I've never
heard about such behavior for data writeback.  Many file systems will
sometimes or always behave like that for metadata writeback, but there
is no guarantees you could rely on for that.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-08-25 16:58 ` [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj Singh
@ 2021-08-27 23:49 ` Neeraj K. Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
                     ` (7 more replies)
  3 siblings, 8 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh

Thanks to everyone for review so far! I've responded to the previous
feedback and changed the patch series a bit.

Changes since v1:

 * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
   to dscho's suggestion, I'm still implementing the Windows version in the
   same patch and I'm not doing autoconf detection since this is a POSIX
   function.

 * Introduce a separate preparatory patch to the bulk-checkin infrastructure
   to separate the 'plugged' variable and rename the 'state' variable, as
   suggested by dscho.

 * Add performance numbers to the commit message of the main bulk fsync
   patch, as suggested by dscho.

 * Add a comment about the non-thread-safety of the bulk-checkin
   infrastructure, as suggested by avarab.

 * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
   by dscho and avarab and others.

 * Add more details to Documentation/config/core.txt about the various
   settings and their intended effects, as suggested by avarab.

 * Switch to the string-list API to hold the rename state, as suggested by
   avarab.

 * Create a separate update-index patch to use bulk-checkin as suggested by
   dscho.

 * Add Windows support in the upstream git. This is done in a way that
   should not conflict with git-for-windows.

 * Add new performance tests that shows the delta based on fsync mode.

NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
on Linux, since sync_file_range does not provide data integrity guarantees.
There is currently no kernel interface suitable to achieve disk flush
batching as is, but he suggested that he might implement a 'syncfs' variant
on top of this patchset. This code is still useful on macOS and Windows, and
the config documentation makes that clear.

Neeraj Singh (6):
  object-file: use futimens rather than utime
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       | 26 ++++++--
 Makefile                            |  6 ++
 builtin/add.c                       |  3 +-
 builtin/update-index.c              |  3 +
 bulk-checkin.c                      | 92 +++++++++++++++++++++++++----
 bulk-checkin.h                      |  4 +-
 cache.h                             |  8 ++-
 compat/mingw.c                      | 53 +++++++++++------
 compat/mingw.h                      |  5 ++
 compat/win32/flush.c                | 29 +++++++++
 config.c                            |  8 ++-
 config.mak.uname                    |  4 ++
 configure.ac                        |  8 +++
 contrib/buildsystems/CMakeLists.txt |  3 +-
 environment.c                       |  2 +-
 git-compat-util.h                   |  7 +++
 object-file.c                       | 23 ++------
 t/perf/lib-unique-files.sh          | 32 ++++++++++
 t/perf/p3700-add.sh                 | 43 ++++++++++++++
 t/perf/p3900-stash.sh               | 46 +++++++++++++++
 wrapper.c                           | 40 +++++++++++++
 write-or-die.c                      |  2 +-
 22 files changed, 389 insertions(+), 58 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/perf/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v2
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v1:

 1:  2c1ddef6057 ! 1:  fc3d5a7b635 object-file: use futimes rather than utime
     @@ Metadata
      Author: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Commit message ##
     -    object-file: use futimes rather than utime
     +    object-file: use futimens rather than utime
      
     -    Refactor the loose object file creation code and use the futimes(2) API
     -    rather than utime. This should be slightly faster given that we already
     -    have an FD to work with.
     +    Make close_loose_object do all of the steps for syncing and correctly
     +    naming a new loose object so that it can be reimplemented in the
     +    upcoming bulk-fsync mode.
     +
     +    Use futimens, which is available in POSIX.1-2008 to update the file
     +    timestamps. This should be slightly faster than utime, since we have
     +    a file descriptor already available. This change allows us to update
     +    the time before closing, renaming, and potentially fsyincing the file
     +    being refreshed. This code is currently only invoked by git-pack-objects
     +    via force_object_loose.
     +
     +    Implement a futimens shim for the Windows port of Git.
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
       ## compat/mingw.c ##
     +@@ compat/mingw.c: int mingw_chmod(const char *filename, int mode)
     +  * The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
     +  * Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
     +  */
     ++
     ++#define UNIX_EPOCH_FILETIME 116444736000000000LL
     ++
     + static inline long long filetime_to_hnsec(const FILETIME *ft)
     + {
     + 	long long winTime = ((long long)ft->dwHighDateTime << 32) + ft->dwLowDateTime;
     + 	/* Windows to Unix Epoch conversion */
     +-	return winTime - 116444736000000000LL;
     ++	return winTime - UNIX_EPOCH_FILETIME;
     + }
     + 
     + static inline void filetime_to_timespec(const FILETIME *ft, struct timespec *ts)
     +@@ compat/mingw.c: static inline void filetime_to_timespec(const FILETIME *ft, struct timespec *ts)
     + 	ts->tv_nsec = (hnsec % 10000000) * 100;
     + }
     + 
     ++static inline void timespec_to_filetime(const struct timespec *t, FILETIME *ft)
     ++{
     ++	long long winTime = t->tv_sec * 10000000LL + t->tv_nsec / 100 + UNIX_EPOCH_FILETIME;
     ++	ft->dwLowDateTime = winTime;
     ++	ft->dwHighDateTime = winTime >> 32;
     ++}
     ++
     + /**
     +  * Verifies that safe_create_leading_directories() would succeed.
     +  */
      @@ compat/mingw.c: int mingw_fstat(int fd, struct stat *buf)
       	}
       }
       
      -static inline void time_t_to_filetime(time_t t, FILETIME *ft)
     -+static inline void timeval_to_filetime(const struct timeval *t, FILETIME *ft)
     ++int mingw_futimens(int fd, const struct timespec times[2])
       {
      -	long long winTime = t * 10000000LL + 116444736000000000LL;
     -+	long long winTime = t->tv_sec * 10000000LL + t->tv_usec * 10 + 116444736000000000LL;
     - 	ft->dwLowDateTime = winTime;
     - 	ft->dwHighDateTime = winTime >> 32;
     - }
     - 
     --int mingw_utime (const char *file_name, const struct utimbuf *times)
     -+int mingw_futimes(int fd, const struct timeval times[2])
     - {
     - 	FILETIME mft, aft;
     +-	ft->dwLowDateTime = winTime;
     +-	ft->dwHighDateTime = winTime >> 32;
     ++	FILETIME mft, aft;
      +
      +	if (times) {
     -+		timeval_to_filetime(&times[0], &aft);
     -+		timeval_to_filetime(&times[1], &mft);
     ++		timespec_to_filetime(&times[0], &aft);
     ++		timespec_to_filetime(&times[1], &mft);
      +	} else {
      +		GetSystemTimeAsFileTime(&mft);
      +		aft = mft;
     @@ compat/mingw.c: int mingw_fstat(int fd, struct stat *buf)
      +	}
      +
      +	return 0;
     -+}
     -+
     -+int mingw_utime (const char *file_name, const struct utimbuf *times)
     -+{
     + }
     + 
     +-int mingw_utime (const char *file_name, const struct utimbuf *times)
     ++int mingw_utime(const char *file_name, const struct utimbuf *times)
     + {
     +-	FILETIME mft, aft;
       	int fh, rc;
       	DWORD attrs;
       	wchar_t wfilename[MAX_PATH];
     -+	struct timeval tvs[2];
     ++	struct timespec ts[2];
      +
       	if (xutftowcs_path(wfilename, file_name) < 0)
       		return -1;
     @@ compat/mingw.c: int mingw_utime (const char *file_name, const struct utimbuf *ti
      -	} else {
      -		GetSystemTimeAsFileTime(&mft);
      -		aft = mft;
     -+		memset(tvs, 0, sizeof(tvs));
     -+		tvs[0].tv_sec = times->actime;
     -+		tvs[1].tv_sec = times->modtime;
     ++		memset(ts, 0, sizeof(ts));
     ++		ts[0].tv_sec = times->actime;
     ++		ts[1].tv_sec = times->modtime;
       	}
      -	if (!SetFileTime((HANDLE)_get_osfhandle(fh), NULL, &aft, &mft)) {
      -		errno = EINVAL;
     @@ compat/mingw.c: int mingw_utime (const char *file_name, const struct utimbuf *ti
      -	} else
      -		rc = 0;
      +
     -+	rc = mingw_futimes(fh, times ? tvs : NULL);
     ++	rc = mingw_futimens(fh, times ? ts : NULL);
       	close(fh);
       
       revert_attrs:
     @@ compat/mingw.h: int mingw_fstat(int fd, struct stat *buf);
       
       int mingw_utime(const char *file_name, const struct utimbuf *times);
       #define utime mingw_utime
     -+int mingw_futimes(int fd, const struct timeval times[2]);
     -+#define futimes mingw_futimes
     ++int mingw_futimens(int fd, const struct timespec times[2]);
     ++#define futimens mingw_futimens
       size_t mingw_strftime(char *s, size_t max,
       		   const char *format, const struct tm *tm);
       #define strftime mingw_strftime
     @@ object-file.c: static int write_loose_object(const struct object_id *oid, char *
      -		utb.modtime = mtime;
      -		if (utime(tmp_file.buf, &utb) < 0)
      -			warning_errno(_("failed utime() on %s"), tmp_file.buf);
     -+		struct timeval tvs[2] = {0};
     -+		tvs[0].tv_sec = mtime;
     -+		tvs[1].tv_sec = mtime;
     -+		if (futimes(fd, tvs) < 0)
     ++		struct timespec ts[2] = {0};
     ++		ts[0].tv_sec = mtime;
     ++		ts[1].tv_sec = mtime;
     ++		if (futimens(fd, ts) < 0)
      +			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
       	}
       
 -:  ----------- > 2:  49f72800bfb bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 2:  d1e68d4a2af ! 3:  2c1c907b12a core.fsyncobjectfiles: batch disk flushes
     @@ Metadata
      Author: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Commit message ##
     -    core.fsyncobjectfiles: batch disk flushes
     +    core.fsyncobjectfiles: batched disk flushes
      
          When adding many objects to a repo with core.fsyncObjectFiles set to
          true, the cost of fsync'ing each object file can become prohibitive.
      
          One major source of the cost of fsync is the implied flush of the
          hardware writeback cache within the disk drive. Fortunately, Windows,
     -    MacOS, and Linux each offer mechanisms to write data from the filesystem
     +    macOS, and Linux each offer mechanisms to write data from the filesystem
          page cache without initiating a hardware flush.
      
     -    This patch introduces a new 'core.fsyncObjectFiles = 2' option that
     +    This patch introduces a new 'core.fsyncObjectFiles = batch' option that
          takes advantage of the bulk-checkin infrastructure to batch up hardware
          flushes.
      
     @@ Commit message
          1. Create a tmp_obj_XXXX file and write the object data to it.
          2. Issue a pagecache writeback request and wait for it to complete.
          3. Record the tmp name and the final name in the bulk-checkin state for
     -       later name.
     +       later rename.
      
          At the end of the entire transaction we:
          1. Issue a fsync against the lock file to flush the hardware writeback
             cache, which should by now have processed the tmp file writes.
          2. Rename all of the temp files to their final names.
     -    3. When updating the index and/or refs, we will issue another fsync
     -       internal to that operation.
     +    3. When updating the index and/or refs, we assume that Git will issue
     +       another fsync internal to that operation.
      
          On a filesystem with a singular journal that is updated during name
          operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
     @@ Commit message
          sequence is enough to ensure that the user's data is durable by the time
          the git command returns.
      
     -    This change also updates the MacOS code to trigger a real hardware flush
     +    This change also updates the macOS code to trigger a real hardware flush
          via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
     -    MacOS there was no guarantee of durability since a simple fsync(2) call
     +    macOS there was no guarantee of durability since a simple fsync(2) call
          does not flush any hardware caches.
      
     +    _Performance numbers_:
     +
     +    Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
     +    Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
     +    Windows - Same host as Linux, a preview version of Windows 11.
     +              This number is from a patch later in the series.
     +
     +    Adding 500 files to the repo with 'git add' Times reported in seconds.
     +
     +    core.fsyncObjectFiles | Linux | Mac   | Windows
     +    ----------------------|-------|-------|--------
     +                    false | 0.06  |  0.35 | 0.61
     +                    true  | 1.88  | 11.18 | 2.47
     +                    batch | 0.15  |  0.41 | 1.53
     +
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Documentation/config/core.txt ##
     @@ Documentation/config/core.txt: core.whitespace::
      -data writes properly, but can be useful for filesystems that do not use
      -journalling (traditional UNIX filesystems) or that only journal metadata
      -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
     -+	A boolean value or the number '2', indicating the level of durability
     -+	applied to object files.
     ++	A value indicating the level of effort Git will expend in
     ++	trying to make objects added to the repo durable in the event
     ++	of an unclean system shutdown. This setting currently only
     ++	controls the object store, so updates to any refs or the
     ++	index may not be equally durable.
      ++
     -+This setting controls how much effort Git makes to ensure that data added to
     -+the object store are durable in the case of an unclean system shutdown. If
     -+'false', Git allows data to remain in file system caches according to operating
     -+system policy, whence they may be lost if the system loses power or crashes. A
     -+value of 'true' instructs Git to force objects to stable storage immediately
     -+when they are added to the object store. The number '2' is an experimental
     -+value that also preserves durability but tries to perform hardware flushes in a
     -+batch.
     ++* `false` allows data to remain in file system caches according to
     ++  operating system policy, whence it may be lost if the system loses power
     ++  or crashes.
     ++* `true` triggers a data integrity flush for each object added to the
     ++  object store. This is the safest setting that is likely to ensure durability
     ++  across all operating systems and file systems that honor the 'fsync' system
     ++  call. However, this setting comes with a significant performance cost on
     ++  common hardware.
     ++* `batch` enables an experimental mode that uses interfaces available in some
     ++  operating systems to write object data with a minimal set of FLUSH CACHE
     ++  (or equivalent) commands sent to the storage controller. If the operating
     ++  system interfaces are not available, this mode behaves the same as `true`.
     ++  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
     ++  filesystems and on Windows for repos stored on NTFS or ReFS.
       
       core.preloadIndex::
       	Enable parallel index preload for operations like 'git diff'
      
       ## Makefile ##
     +@@ Makefile: all::
     + #
     + # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
     + #
     ++# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
     ++#
     + # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
     + # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
     + #
      @@ Makefile: ifdef HAVE_CLOCK_MONOTONIC
       	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
       endif
     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
       finish:
       	if (write_locked_index(&the_index, &lock_file,
      
     - ## builtin/update-index.c ##
     -@@
     -  */
     - #define USE_THE_INDEX_COMPATIBILITY_MACROS
     - #include "cache.h"
     -+#include "bulk-checkin.h"
     - #include "config.h"
     - #include "lockfile.h"
     - #include "quote.h"
     -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     - 		struct strbuf unquoted = STRBUF_INIT;
     - 
     - 		setup_work_tree();
     -+		plug_bulk_checkin();
     - 		while (getline_fn(&buf, stdin) != EOF) {
     - 			char *p;
     - 			if (!nul_term_line && buf.buf[0] == '"') {
     -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     - 				chmod_path(set_executable_bit, p);
     - 			free(p);
     - 		}
     -+		unplug_bulk_checkin(&lock_file);
     - 		strbuf_release(&unquoted);
     - 		strbuf_release(&buf);
     - 	}
     -
       ## bulk-checkin.c ##
      @@
        */
     @@ bulk-checkin.c
       #include "repository.h"
       #include "csum-file.h"
       #include "pack.h"
     -@@
     + #include "strbuf.h"
     ++#include "string-list.h"
       #include "packfile.h"
       #include "object-store.h"
       
     -+struct object_rename {
     -+	char *src;
     -+	char *dst;
     -+};
     -+
     -+static struct bulk_rename_state {
     -+	struct object_rename *renames;
     -+	uint32_t alloc_renames;
     -+	uint32_t nr_renames;
     -+} bulk_rename_state;
     -+
     - static struct bulk_checkin_state {
     - 	unsigned plugged:1;
     + static int bulk_checkin_plugged;
       
     -@@ bulk-checkin.c: static struct bulk_checkin_state {
     - 	struct pack_idx_entry **written;
     - 	uint32_t alloc_written;
     - 	uint32_t nr_written;
     --} state;
     ++static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
      +
     -+} bulk_checkin_state;
     - 
     - static void finish_bulk_checkin(struct bulk_checkin_state *state)
     - {
     - 	struct object_id oid;
     - 	struct strbuf packname = STRBUF_INIT;
     - 	int i;
     -+	unsigned old_plugged;
     - 
     - 	if (!state->f)
     - 		return;
     -@@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state)
     - 
     - clear_exit:
     - 	free(state->written);
     -+	old_plugged = state->plugged;
     - 	memset(state, 0, sizeof(*state));
     -+	state->plugged = old_plugged;
     - 
     - 	strbuf_release(&packname);
     - 	/* Make objects we just wrote available to ourselves */
     + static struct bulk_checkin_state {
     + 	char *pack_tmp_name;
     + 	struct hashfile *f;
     +@@ bulk-checkin.c: clear_exit:
       	reprepare_packed_git(the_repository);
       }
       
     -+static void do_sync_and_rename(struct bulk_rename_state *state, struct lock_file *lock_file)
     ++static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
      +{
     -+	if (state->nr_renames) {
     -+		int i;
     ++	if (fsync_state->nr) {
     ++		struct string_list_item *rename;
      +
      +		/*
      +		 * Issue a full hardware flush against the lock file to ensure
     @@ bulk-checkin.c: static void finish_bulk_checkin(struct bulk_checkin_state *state
      +		 */
      +		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
      +
     -+		for (i = 0; i < state->nr_renames; i++) {
     -+			if (finalize_object_file(state->renames[i].src, state->renames[i].dst))
     -+				die_errno(_("could not rename '%s'"), state->renames[i].src);
     ++		for_each_string_list_item(rename, fsync_state) {
     ++			const char *src = rename->string;
     ++			const char *dst = rename->util;
      +
     -+			free(state->renames[i].src);
     -+			free(state->renames[i].dst);
     ++			if (finalize_object_file(src, dst))
     ++				die_errno(_("could not rename '%s' to '%s'"), src, dst);
      +		}
      +
     -+		free(state->renames);
     -+		memset(state, 0, sizeof(*state));
     ++		string_list_clear(fsync_state, 1);
      +	}
      +}
      +
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
       	return 0;
       }
       
     -+static void add_rename_bulk_checkin(struct bulk_rename_state *state,
     ++static void add_rename_bulk_checkin(struct string_list *fsync_state,
      +				    const char *src, const char *dst)
      +{
     -+	struct object_rename *rename;
     -+
     -+	ALLOC_GROW(state->renames, state->nr_renames + 1, state->alloc_renames);
     -+
     -+	rename = &state->renames[state->nr_renames++];
     -+	rename->src = xstrdup(src);
     -+	rename->dst = xstrdup(dst);
     ++	string_list_insert(fsync_state, src)->util = xstrdup(dst);
      +}
      +
      +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
      +					      const char *filename)
      +{
     -+	if (fsync_object_files) {
     ++	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
      +		/*
      +		 * If we have a plugged bulk checkin, we issue a call that
      +		 * cleans the filesystem page cache but avoids a hardware flush
      +		 * command. Later on we will issue a single hardware flush
      +		 * before renaming files as part of do_sync_and_rename.
      +		 */
     -+		if (bulk_checkin_state.plugged &&
     -+		    fsync_object_files == 2 &&
     ++		if (bulk_checkin_plugged &&
     ++		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
      +		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
     -+			add_rename_bulk_checkin(&bulk_rename_state, tmpfile, filename);
     ++			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
      +			if (close(fd))
      +				die_errno(_("error when closing loose object file"));
      +
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
       int index_bulk_checkin(struct object_id *oid,
       		       int fd, size_t size, enum object_type type,
       		       const char *path, unsigned flags)
     - {
     --	int status = deflate_to_pack(&state, oid, fd, size, type,
     -+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
     - 				     path, flags);
     --	if (!state.plugged)
     --		finish_bulk_checkin(&state);
     -+	if (!bulk_checkin_state.plugged)
     -+		finish_bulk_checkin(&bulk_checkin_state);
     - 	return status;
     - }
     - 
     - void plug_bulk_checkin(void)
     - {
     --	state.plugged = 1;
     -+	bulk_checkin_state.plugged = 1;
     +@@ bulk-checkin.c: void plug_bulk_checkin(void)
     + 	bulk_checkin_plugged = 1;
       }
       
      -void unplug_bulk_checkin(void)
      +void unplug_bulk_checkin(struct lock_file *lock_file)
       {
     --	state.plugged = 0;
     --	if (state.f)
     --		finish_bulk_checkin(&state);
     -+	bulk_checkin_state.plugged = 0;
     -+	if (bulk_checkin_state.f)
     -+		finish_bulk_checkin(&bulk_checkin_state);
     + 	assert(bulk_checkin_plugged);
     + 	bulk_checkin_plugged = 0;
     + 	if (bulk_checkin_state.f)
     + 		finish_bulk_checkin(&bulk_checkin_state);
      +
     -+	do_sync_and_rename(&bulk_rename_state, lock_file);
     ++	do_sync_and_rename(&bulk_fsync_state, lock_file);
       }
      
       ## bulk-checkin.h ##
     @@ bulk-checkin.h
       
       #endif
      
     + ## cache.h ##
     +@@ cache.h: void reset_shared_repository(void);
     + extern int read_replace_refs;
     + extern char *git_replace_ref_base;
     + 
     +-extern int fsync_object_files;
     ++enum FSYNC_OBJECT_FILES_MODE {
     ++    FSYNC_OBJECT_FILES_OFF,
     ++    FSYNC_OBJECT_FILES_ON,
     ++    FSYNC_OBJECT_FILES_BATCH
     ++};
     ++
     ++extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
     + extern int core_preload_index;
     + extern int precomposed_unicode;
     + extern int protect_hfs;
     +
       ## config.c ##
      @@ config.c: static int git_default_core_config(const char *var, const char *value, void *cb)
       	}
       
       	if (!strcmp(var, "core.fsyncobjectfiles")) {
      -		fsync_object_files = git_config_bool(var, value);
     -+		int is_bool;
     -+
     -+		fsync_object_files = git_config_bool_or_int(var, value, &is_bool);
     ++		if (!value)
     ++			return config_error_nonbool(var);
     ++		if (!strcasecmp(value, "batch"))
     ++			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
     ++		else
     ++			fsync_object_files = git_config_bool(var, value)
     ++				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
       		return 0;
       	}
       
     @@ configure.ac: AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
       # Define NO_SETITIMER if you don't have setitimer.
       GIT_CHECK_FUNC(setitimer,
      
     + ## environment.c ##
     +@@ environment.c: const char *git_hooks_path;
     + int zlib_compression_level = Z_BEST_SPEED;
     + int core_compression_level;
     + int pack_compression_level = Z_DEFAULT_COMPRESSION;
     +-int fsync_object_files;
     ++enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
     + size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
     + size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
     + size_t delta_base_cache_limit = 96 * 1024 * 1024;
     +
       ## git-compat-util.h ##
      @@ git-compat-util.h: __attribute__((format (printf, 1, 2))) NORETURN
       void BUG(const char *fmt, ...);
 -:  ----------- > 4:  546ad9c82e8 core.fsyncobjectfiles: add windows support for batch mode
 -:  ----------- > 5:  d8843185fe4 update-index: use the bulk-checkin infrastructure
 -:  ----------- > 6:  73b5d41be94 core.fsyncobjectfiles: performance tests for add and stash

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 1/6] object-file: use futimens rather than utime
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
@ 2021-08-27 23:49   ` Neeraj Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Make close_loose_object do all of the steps for syncing and correctly
naming a new loose object so that it can be reimplemented in the
upcoming bulk-fsync mode.

Use futimens, which is available in POSIX.1-2008 to update the file
timestamps. This should be slightly faster than utime, since we have
a file descriptor already available. This change allows us to update
the time before closing, renaming, and potentially fsyincing the file
being refreshed. This code is currently only invoked by git-pack-objects
via force_object_loose.

Implement a futimens shim for the Windows port of Git.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.c | 53 ++++++++++++++++++++++++++++++++++----------------
 compat/mingw.h |  2 ++
 object-file.c  | 17 ++++++++--------
 3 files changed, 46 insertions(+), 26 deletions(-)

diff --git a/compat/mingw.c b/compat/mingw.c
index 9e0cd1e097f..ce14b21c182 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -734,11 +734,14 @@ int mingw_chmod(const char *filename, int mode)
  * The unit of FILETIME is 100-nanoseconds since January 1, 1601, UTC.
  * Returns the 100-nanoseconds ("hekto nanoseconds") since the epoch.
  */
+
+#define UNIX_EPOCH_FILETIME 116444736000000000LL
+
 static inline long long filetime_to_hnsec(const FILETIME *ft)
 {
 	long long winTime = ((long long)ft->dwHighDateTime << 32) + ft->dwLowDateTime;
 	/* Windows to Unix Epoch conversion */
-	return winTime - 116444736000000000LL;
+	return winTime - UNIX_EPOCH_FILETIME;
 }
 
 static inline void filetime_to_timespec(const FILETIME *ft, struct timespec *ts)
@@ -748,6 +751,13 @@ static inline void filetime_to_timespec(const FILETIME *ft, struct timespec *ts)
 	ts->tv_nsec = (hnsec % 10000000) * 100;
 }
 
+static inline void timespec_to_filetime(const struct timespec *t, FILETIME *ft)
+{
+	long long winTime = t->tv_sec * 10000000LL + t->tv_nsec / 100 + UNIX_EPOCH_FILETIME;
+	ft->dwLowDateTime = winTime;
+	ft->dwHighDateTime = winTime >> 32;
+}
+
 /**
  * Verifies that safe_create_leading_directories() would succeed.
  */
@@ -949,19 +959,33 @@ int mingw_fstat(int fd, struct stat *buf)
 	}
 }
 
-static inline void time_t_to_filetime(time_t t, FILETIME *ft)
+int mingw_futimens(int fd, const struct timespec times[2])
 {
-	long long winTime = t * 10000000LL + 116444736000000000LL;
-	ft->dwLowDateTime = winTime;
-	ft->dwHighDateTime = winTime >> 32;
+	FILETIME mft, aft;
+
+	if (times) {
+		timespec_to_filetime(&times[0], &aft);
+		timespec_to_filetime(&times[1], &mft);
+	} else {
+		GetSystemTimeAsFileTime(&mft);
+		aft = mft;
+	}
+
+	if (!SetFileTime((HANDLE)_get_osfhandle(fd), NULL, &aft, &mft)) {
+		errno = EINVAL;
+		return -1;
+	}
+
+	return 0;
 }
 
-int mingw_utime (const char *file_name, const struct utimbuf *times)
+int mingw_utime(const char *file_name, const struct utimbuf *times)
 {
-	FILETIME mft, aft;
 	int fh, rc;
 	DWORD attrs;
 	wchar_t wfilename[MAX_PATH];
+	struct timespec ts[2];
+
 	if (xutftowcs_path(wfilename, file_name) < 0)
 		return -1;
 
@@ -979,17 +1003,12 @@ int mingw_utime (const char *file_name, const struct utimbuf *times)
 	}
 
 	if (times) {
-		time_t_to_filetime(times->modtime, &mft);
-		time_t_to_filetime(times->actime, &aft);
-	} else {
-		GetSystemTimeAsFileTime(&mft);
-		aft = mft;
+		memset(ts, 0, sizeof(ts));
+		ts[0].tv_sec = times->actime;
+		ts[1].tv_sec = times->modtime;
 	}
-	if (!SetFileTime((HANDLE)_get_osfhandle(fh), NULL, &aft, &mft)) {
-		errno = EINVAL;
-		rc = -1;
-	} else
-		rc = 0;
+
+	rc = mingw_futimens(fh, times ? ts : NULL);
 	close(fh);
 
 revert_attrs:
diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..87944dfec72 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -398,6 +398,8 @@ int mingw_fstat(int fd, struct stat *buf);
 
 int mingw_utime(const char *file_name, const struct utimbuf *times);
 #define utime mingw_utime
+int mingw_futimens(int fd, const struct timespec times[2]);
+#define futimens mingw_futimens
 size_t mingw_strftime(char *s, size_t max,
 		   const char *format, const struct tm *tm);
 #define strftime mingw_strftime
diff --git a/object-file.c b/object-file.c
index a8be8994814..5421811273e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1860,12 +1860,13 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 }
 
 /* Finalize a file on disk, and close it. */
-static void close_loose_object(int fd)
+static int close_loose_object(int fd, const char *tmpfile, const char *filename)
 {
 	if (fsync_object_files)
 		fsync_or_die(fd, "loose object file");
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
+	return finalize_object_file(tmpfile, filename);
 }
 
 /* Size of directory component, including the ending '/' */
@@ -1973,17 +1974,15 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
 
-	close_loose_object(fd);
-
 	if (mtime) {
-		struct utimbuf utb;
-		utb.actime = mtime;
-		utb.modtime = mtime;
-		if (utime(tmp_file.buf, &utb) < 0)
-			warning_errno(_("failed utime() on %s"), tmp_file.buf);
+		struct timespec ts[2] = {0};
+		ts[0].tv_sec = mtime;
+		ts[1].tv_sec = mtime;
+		if (futimens(fd, ts) < 0)
+			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
 	}
 
-	return finalize_object_file(tmp_file.buf, filename.buf);
+	return close_loose_object(fd, tmp_file.buf, filename.buf);
 }
 
 static int freshen_loose_object(const struct object_id *oid)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
@ 2021-08-27 23:49   ` Neeraj Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index b023d9959aa..f117d62c908 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
@@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-08-27 23:49   ` Neeraj Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
macOS, and Linux each offer mechanisms to write data from the filesystem
page cache without initiating a hardware flush.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
takes advantage of the bulk-checkin infrastructure to batch up hardware
flushes.

When the new mode is enabled we do the following for new objects:

1. Create a tmp_obj_XXXX file and write the object data to it.
2. Issue a pagecache writeback request and wait for it to complete.
3. Record the tmp name and the final name in the bulk-checkin state for
   later rename.

At the end of the entire transaction we:
1. Issue a fsync against the lock file to flush the hardware writeback
   cache, which should by now have processed the tmp file writes.
2. Rename all of the temp files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 26 ++++++++++---
 Makefile                      |  6 +++
 builtin/add.c                 |  3 +-
 bulk-checkin.c                | 70 ++++++++++++++++++++++++++++++++++-
 bulk-checkin.h                |  4 +-
 cache.h                       |  8 +++-
 config.c                      |  8 +++-
 config.mak.uname              |  2 +
 configure.ac                  |  8 ++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 ++++
 object-file.c                 | 12 +-----
 wrapper.c                     | 36 ++++++++++++++++++
 write-or-die.c                |  2 +-
 14 files changed, 170 insertions(+), 24 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..0006d90980d 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,26 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls the object store, so updates to any refs or the
+	index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write object data with a minimal set of FLUSH CACHE
+  (or equivalent) commands sent to the storage controller. If the operating
+  system interfaces are not available, this mode behaves the same as `true`.
+  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
+  filesystems and on Windows for repos stored on NTFS or ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 9573190f1d7..143d30f04cb 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/builtin/add.c b/builtin/add.c
index 09e684585d9..c58dfcd4bc3 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -670,7 +670,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (chmod_arg && pathspec.nr)
 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
-	unplug_bulk_checkin();
+
+	unplug_bulk_checkin(&lock_file);
 
 finish:
 	if (write_locked_index(&the_index, &lock_file,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index f117d62c908..47b42f610c0 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,15 +3,19 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
 
+static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
+
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
@@ -62,6 +66,32 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
+{
+	if (fsync_state->nr) {
+		struct string_list_item *rename;
+
+		/*
+		 * Issue a full hardware flush against the lock file to ensure
+		 * that all objects are durable before any renames occur.
+		 * The code in fsync_and_close_loose_object_bulk_checkin has
+		 * already ensured that writeout has occurred, but it has not
+		 * flushed any writeback cache in the storage hardware.
+		 */
+		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
+
+		for_each_string_list_item(rename, fsync_state) {
+			const char *src = rename->string;
+			const char *dst = rename->util;
+
+			if (finalize_object_file(src, dst))
+				die_errno(_("could not rename '%s' to '%s'"), src, dst);
+		}
+
+		string_list_clear(fsync_state, 1);
+	}
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -256,6 +286,42 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+static void add_rename_bulk_checkin(struct string_list *fsync_state,
+				    const char *src, const char *dst)
+{
+	string_list_insert(fsync_state, src)->util = xstrdup(dst);
+}
+
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
+					      const char *filename)
+{
+	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
+		/*
+		 * If we have a plugged bulk checkin, we issue a call that
+		 * cleans the filesystem page cache but avoids a hardware flush
+		 * command. Later on we will issue a single hardware flush
+		 * before renaming files as part of do_sync_and_rename.
+		 */
+		if (bulk_checkin_plugged &&
+		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
+		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
+			if (close(fd))
+				die_errno(_("error when closing loose object file"));
+
+			return 0;
+
+		} else {
+			fsync_or_die(fd, "loose object file");
+		}
+	}
+
+	if (close(fd))
+		die_errno(_("error when closing loose object file"));
+
+	return finalize_object_file(tmpfile, filename);
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -273,10 +339,12 @@ void plug_bulk_checkin(void)
 	bulk_checkin_plugged = 1;
 }
 
-void unplug_bulk_checkin(void)
+void unplug_bulk_checkin(struct lock_file *lock_file)
 {
 	assert(bulk_checkin_plugged);
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_sync_and_rename(&bulk_fsync_state, lock_file);
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..8efb01ed669 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,11 +6,13 @@
 
 #include "cache.h"
 
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, const char *filename);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
 
 void plug_bulk_checkin(void);
-void unplug_bulk_checkin(void);
+void unplug_bulk_checkin(struct lock_file *);
 
 #endif
diff --git a/cache.h b/cache.h
index bd4869beee4..cde6c6ae6b1 100644
--- a/cache.h
+++ b/cache.h
@@ -985,7 +985,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index f33abeab851..ab1980f8fec 100644
--- a/config.c
+++ b/config.c
@@ -1509,7 +1509,13 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (!value)
+			return config_error_nonbool(var);
+		if (!strcasecmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else
+			fsync_object_files = git_config_bool(var, value)
+				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 69413fb3dc0..8c07f2265a8 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
@@ -133,6 +134,7 @@ ifeq ($(uname_S),Darwin)
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 	BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1
+	BASIC_CFLAGS += -DFSYNC_DOESNT_FLUSH=1
 	HAVE_BSD_SYSCTL = YesPlease
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
 	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index d6b22ede7ea..3e23eafff80 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index b46605300ab..d14e2436276 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index 5421811273e..94a63809613 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1859,16 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 	return 0;
 }
 
-/* Finalize a file on disk, and close it. */
-static int close_loose_object(int fd, const char *tmpfile, const char *filename)
-{
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
-	if (close(fd) != 0)
-		die_errno(_("error when closing loose object file"));
-	return finalize_object_file(tmpfile, filename);
-}
-
 /* Size of directory component, including the ending '/' */
 static inline int directory_size(const char *filename)
 {
@@ -1982,7 +1972,7 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
 	}
 
-	return close_loose_object(fd, tmp_file.buf, filename.buf);
+	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, filename.buf);
 }
 
 static int freshen_loose_object(const struct object_id *oid)
diff --git a/wrapper.c b/wrapper.c
index 563ad590df1..37a8b61a7df 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -538,6 +538,42 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	if (action == FSYNC_WRITEOUT_ONLY) {
+#ifdef __APPLE__
+		/*
+		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+	}
+
+#ifdef __APPLE__
+	return fcntl(fd, F_FULLFSYNC);
+#else
+	return fsync(fd);
+#endif
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index d33e68f6abb..8f53953d4ab 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-08-27 23:49   ` [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-08-27 23:49   ` Neeraj Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 29 +++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index 87944dfec72..b5c950f1e30 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..c013920ce37
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,29 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index 8c07f2265a8..ef1fd109b74 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -450,6 +450,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -624,6 +625,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..b573a5ee122 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index 37a8b61a7df..d951306b33e 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -563,6 +563,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-08-27 23:49   ` [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-08-27 23:49   ` Neeraj Singh via GitGitGadget
  2021-08-27 23:49   ` [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality. This mode
is enabled when passing paths to update-index via the --stdin flag,
as is done by 'git stash'.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to snoop the output of --verbose to
find out when update-index has actually processed a given path.
Additionally the index is locked for the duration of the update.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index f1f16f2de52..64d025cf49e 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1152,6 +1153,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		struct strbuf unquoted = STRBUF_INIT;
 
 		setup_work_tree();
+		plug_bulk_checkin();
 		while (getline_fn(&buf, stdin) != EOF) {
 			char *p;
 			if (!nul_term_line && buf.buf[0] == '"') {
@@ -1166,6 +1168,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				chmod_path(set_executable_bit, p);
 			free(p);
 		}
+		unplug_bulk_checkin(&lock_file);
 		strbuf_release(&unquoted);
 		strbuf_release(&buf);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-08-27 23:49   ` [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-08-27 23:49   ` Neeraj Singh via GitGitGadget
  2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-08-27 23:49 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/lib-unique-files.sh | 32 ++++++++++++++++++++++++++
 t/perf/p3700-add.sh        | 43 +++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh      | 46 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+)
 create mode 100644 t/perf/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/lib-unique-files.sh b/t/perf/lib-unique-files.sh
new file mode 100644
index 00000000000..10083395ae5
--- /dev/null
+++ b/t/perf/lib-unique-files.sh
@@ -0,0 +1,32 @@
+# Helper to create files with unique contents
+
+test_create_unique_files_base__=$(date -u)
+test_create_unique_files_counter__=0
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
+#				    each in the current directory, all
+#				    with unique contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir" > /dev/null
+		for j in $(test_seq $files)
+		do
+			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
+			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..4ca3224f364
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/perf/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..407b95c104b
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/perf/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-26  5:50       ` Christoph Hellwig
@ 2021-08-28  0:20         ` Neeraj Singh
  2021-08-28  6:57           ` Christoph Hellwig
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-08-28  0:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ævar Arnfjörð Bjarmason,
	Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Neeraj Singh

On Wed, Aug 25, 2021 at 10:50 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Wed, Aug 25, 2021 at 05:49:45PM -0700, Neeraj Singh wrote:
> > One conclusion from reviewing that thread is that as of then,
> > sync_file_ranges isn't actually enough
> > to make a hard guarantee about writeout occurring. See
> > https://lore.kernel.org/linux-fsdevel/20190319204330.GY26298@dastard/.
> > My hope is that the Linux FS developers have rectified that shortcoming by now.
>
> I'm not sure what shortcoming you mean.  sync_file_ranges is a system
> call that only causes data writeback.  It never performs metadata write
> back and thus is not an integrity operation at all.  That is also very
> clearly documented in the man page.
>

You're right. On re-read of the man page, sync_file_range is listed as
an "extremely dangerous"
system call.  The opportunity in the linux kernel is to offer an
alternative set of flags or separate
API that allows for an application like Git to separate a metadata
writeback request from the disk flush.

Separately, I'm hoping I can push from the Windows filesystem side to
get a barrier primitive put into
the NVME standard so that we can offer more useful behavior to
applications rather than these painful
hardware flushes.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-28  0:20         ` Neeraj Singh
@ 2021-08-28  6:57           ` Christoph Hellwig
  2021-08-31 19:59             ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Christoph Hellwig @ 2021-08-28  6:57 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Neeraj Singh

On Fri, Aug 27, 2021 at 05:20:44PM -0700, Neeraj Singh wrote:
> You're right. On re-read of the man page, sync_file_range is listed as
> an "extremely dangerous"
> system call.  The opportunity in the linux kernel is to offer an
> alternative set of flags or separate
> API that allows for an application like Git to separate a metadata
> writeback request from the disk flush.

How do you want to do that?  I metadata writeback without a cache flush
is worse than useless, in fact it is generally actively harmful.

To take XFS as an example:  fsync and fdatasync do the following thing:

 1) writeback all dirty data for file to the data device
 2) flush the write cache of the data device to ensure they are really
    on disk before writing back the metadata referring to them
 3) write out the log up until the log sequence that contained the last
    modifications to the file
 4) flush the cache for the log device.
    If the data device and the log device are the same (they usually are
    for common setups) and the log device support the FUA bit that writes
    through the cache, the log writes use that bit and this step can
    be skipped.

So in general there are very few metadata writes, and it is absolutely
essential to flush the cache before that, because otherwise your metadata
could point to data that might not actually have made it to disk.

The best way to optimize such a workload is by first batching all the
data writeout for multiple fils in step one, then only doing one cache
flush and one log force (as we call it) to cover all the files.  syncfs
will do that, but without a good way to pick individual files.

> Separately, I'm hoping I can push from the Windows filesystem side to
> get a barrier primitive put into
> the NVME standard so that we can offer more useful behavior to
> applications rather than these painful
> hardware flushes.

I'm not sure what you mean with barriers, but if you mean the concept
of implying a global ordering on I/Os as we did in Linux back in the
bad old days the barrier bio flag, or badly reinvented by this paper:

  https://www.usenix.org/conference/fast18/presentation/won

they might help a little bit with single threaded operations, but will
heavily degrade I/O performance for multithreaded workloads.  As an
active member of (but not speaking for) the NVMe technical working group
with a bit of knowledge of SSD internals I also doubt it will be very
well received there.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-28  6:57           ` Christoph Hellwig
@ 2021-08-31 19:59             ` Neeraj Singh
  2021-09-01  5:09               ` Christoph Hellwig
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-08-31 19:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ævar Arnfjörð Bjarmason,
	Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Neeraj Singh

On Fri, Aug 27, 2021 at 11:57 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Fri, Aug 27, 2021 at 05:20:44PM -0700, Neeraj Singh wrote:
> > You're right. On re-read of the man page, sync_file_range is listed as
> > an "extremely dangerous"
> > system call.  The opportunity in the linux kernel is to offer an
> > alternative set of flags or separate
> > API that allows for an application like Git to separate a metadata
> > writeback request from the disk flush.
>
> How do you want to do that?  I metadata writeback without a cache flush
> is worse than useless, in fact it is generally actively harmful.
>
> To take XFS as an example:  fsync and fdatasync do the following thing:
>
>  1) writeback all dirty data for file to the data device
>  2) flush the write cache of the data device to ensure they are really
>     on disk before writing back the metadata referring to them
>  3) write out the log up until the log sequence that contained the last
>     modifications to the file
>  4) flush the cache for the log device.
>     If the data device and the log device are the same (they usually are
>     for common setups) and the log device support the FUA bit that writes
>     through the cache, the log writes use that bit and this step can
>     be skipped.
>
> So in general there are very few metadata writes, and it is absolutely
> essential to flush the cache before that, because otherwise your metadata
> could point to data that might not actually have made it to disk.
>
> The best way to optimize such a workload is by first batching all the
> data writeout for multiple fils in step one, then only doing one cache
> flush and one log force (as we call it) to cover all the files.  syncfs
> will do that, but without a good way to pick individual files.

Yes, I think we want to do step (1) of your sequence for all of the files, then
issue steps (2-4) for all files as a group.  Of course, if the log
fills up then we
can flush the intermediate steps.  The unfortunate thing is that
there's no Linux interface
to do step (1) and to also ensure that the relevant data is in the log
stream or is
otherwise available to be part of the durable metadata.

It seems to me that XFS would be compatible with this sequence if the
appropriate
kernel API exists.

>
> > Separately, I'm hoping I can push from the Windows filesystem side to
> > get a barrier primitive put into
> > the NVME standard so that we can offer more useful behavior to
> > applications rather than these painful
> > hardware flushes.
>
> I'm not sure what you mean with barriers, but if you mean the concept
> of implying a global ordering on I/Os as we did in Linux back in the
> bad old days the barrier bio flag, or badly reinvented by this paper:
>
>   https://www.usenix.org/conference/fast18/presentation/won
>
> they might help a little bit with single threaded operations, but will
> heavily degrade I/O performance for multithreaded workloads.  As an
> active member of (but not speaking for) the NVMe technical working group
> with a bit of knowledge of SSD internals I also doubt it will be very
> well received there.

I looked at that paper and definitely agree with you about the questionable
implementation strategy they picked. I don't (yet) have detailed knowledge of
SSD internals, but it's surprising to me that there is little value to
barrier semantics
within the drive as opposed to a full durability sync. At least for
Windows, we have
a database (the Registry) for which any single-threaded latency
improvement would
be welcome.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes
  2021-08-31 19:59             ` Neeraj Singh
@ 2021-09-01  5:09               ` Christoph Hellwig
  0 siblings, 0 replies; 160+ messages in thread
From: Christoph Hellwig @ 2021-09-01  5:09 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Neeraj Singh

On Tue, Aug 31, 2021 at 12:59:14PM -0700, Neeraj Singh wrote:
> > So in general there are very few metadata writes, and it is absolutely
> > essential to flush the cache before that, because otherwise your metadata
> > could point to data that might not actually have made it to disk.
> >
> > The best way to optimize such a workload is by first batching all the
> > data writeout for multiple fils in step one, then only doing one cache
> > flush and one log force (as we call it) to cover all the files.  syncfs
> > will do that, but without a good way to pick individual files.
> 
> Yes, I think we want to do step (1) of your sequence for all of the files, then
> issue steps (2-4) for all files as a group.  Of course, if the log
> fills up then we
> can flush the intermediate steps.  The unfortunate thing is that
> there's no Linux interface
> to do step (1) and to also ensure that the relevant data is in the log
> stream or is
> otherwise available to be part of the durable metadata.

There is also no interface to do 2-4 separately, mostly because they
are so hard to separate.  The only API I could envision is one that takes
an array of file descriptors and has the semantics of doing a fsync/
fdatasync for all of them, allowing the implementation to optimize
the order.  It would be implementable, but not quite as efficient
as syncfs.  I'm also pretty sure we've seen a few attempts at it in
the past that ran into various issues and didn't really make it far.

> I looked at that paper and definitely agree with you about the questionable
> implementation strategy they picked. I don't (yet) have detailed knowledge of
> SSD internals, but it's surprising to me that there is little value to
> barrier semantics
> within the drive as opposed to a full durability sync. At least for
> Windows, we have
> a database (the Registry) for which any single-threaded latency
> improvement would
> be welcome.

The major issue with barrier like in the paper above or as historic
Linux 2.6 kernels had it is that it enforces a global ordering.  For
software-only implementation like the Linux one this was already bad
enough, but a hardware/firmware implementation in nvme would mean you'd
have to add global serialize to a storage interface standard and its
implementations, while these are very much about offering parallelisms.
In fact we'd also have very similar issues with the modern Linux block
layer, which has applied some similar ideas.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-08-27 23:49   ` [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-09-07 19:44   ` Neeraj Singh
  2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
                       ` (2 more replies)
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
  7 siblings, 3 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-07 19:44 UTC (permalink / raw)
  To: Neeraj K. Singh via GitGitGadget
  Cc: Git List, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh

On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Thanks to everyone for review so far! I've responded to the previous
> feedback and changed the patch series a bit.
>
> Changes since v1:
>
>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>    to dscho's suggestion, I'm still implementing the Windows version in the
>    same patch and I'm not doing autoconf detection since this is a POSIX
>    function.
>
>  * Introduce a separate preparatory patch to the bulk-checkin infrastructure
>    to separate the 'plugged' variable and rename the 'state' variable, as
>    suggested by dscho.
>
>  * Add performance numbers to the commit message of the main bulk fsync
>    patch, as suggested by dscho.
>
>  * Add a comment about the non-thread-safety of the bulk-checkin
>    infrastructure, as suggested by avarab.
>
>  * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
>    by dscho and avarab and others.
>
>  * Add more details to Documentation/config/core.txt about the various
>    settings and their intended effects, as suggested by avarab.
>
>  * Switch to the string-list API to hold the rename state, as suggested by
>    avarab.
>
>  * Create a separate update-index patch to use bulk-checkin as suggested by
>    dscho.
>
>  * Add Windows support in the upstream git. This is done in a way that
>    should not conflict with git-for-windows.
>
>  * Add new performance tests that shows the delta based on fsync mode.
>
> NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
> on Linux, since sync_file_range does not provide data integrity guarantees.
> There is currently no kernel interface suitable to achieve disk flush
> batching as is, but he suggested that he might implement a 'syncfs' variant
> on top of this patchset. This code is still useful on macOS and Windows, and
> the config documentation makes that clear.
>
> Neeraj Singh (6):
>   object-file: use futimens rather than utime
>   bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>   core.fsyncobjectfiles: batched disk flushes
>   core.fsyncobjectfiles: add windows support for batch mode
>   update-index: use the bulk-checkin infrastructure
>   core.fsyncobjectfiles: performance tests for add and stash
>
>  Documentation/config/core.txt       | 26 ++++++--
>  Makefile                            |  6 ++
>  builtin/add.c                       |  3 +-
>  builtin/update-index.c              |  3 +
>  bulk-checkin.c                      | 92 +++++++++++++++++++++++++----
>  bulk-checkin.h                      |  4 +-
>  cache.h                             |  8 ++-
>  compat/mingw.c                      | 53 +++++++++++------
>  compat/mingw.h                      |  5 ++
>  compat/win32/flush.c                | 29 +++++++++
>  config.c                            |  8 ++-
>  config.mak.uname                    |  4 ++
>  configure.ac                        |  8 +++
>  contrib/buildsystems/CMakeLists.txt |  3 +-
>  environment.c                       |  2 +-
>  git-compat-util.h                   |  7 +++
>  object-file.c                       | 23 ++------
>  t/perf/lib-unique-files.sh          | 32 ++++++++++
>  t/perf/p3700-add.sh                 | 43 ++++++++++++++
>  t/perf/p3900-stash.sh               | 46 +++++++++++++++
>  wrapper.c                           | 40 +++++++++++++
>  write-or-die.c                      |  2 +-
>  22 files changed, 389 insertions(+), 58 deletions(-)
>  create mode 100644 compat/win32/flush.c
>  create mode 100644 t/perf/lib-unique-files.sh
>  create mode 100755 t/perf/p3700-add.sh
>  create mode 100755 t/perf/p3900-stash.sh
>
>
> base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v2
> Pull-Request: https://github.com/git/git/pull/1076

Hello everyone,
I'd like to bump this review up in people's inboxes since Patch V2
hasn't gotten any traction in over a week.

Thanks in advance for taking a look,
- Neeraj Singh
Windows Core Filesystems Team

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
@ 2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
  2021-09-07 19:54     ` Randall S. Becker
  2021-09-08  0:55     ` Neeraj Singh
  2 siblings, 0 replies; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 19:50 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Neeraj K. Singh


On Tue, Sep 07 2021, Neeraj Singh wrote:

> On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> Thanks to everyone for review so far! I've responded to the previous
>> feedback and changed the patch series a bit.
>>
>> Changes since v1:
>>
>>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>>    to dscho's suggestion, I'm still implementing the Windows version in the
>>    same patch and I'm not doing autoconf detection since this is a POSIX
>>    function.
>>
>>  * Introduce a separate preparatory patch to the bulk-checkin infrastructure
>>    to separate the 'plugged' variable and rename the 'state' variable, as
>>    suggested by dscho.
>>
>>  * Add performance numbers to the commit message of the main bulk fsync
>>    patch, as suggested by dscho.
>>
>>  * Add a comment about the non-thread-safety of the bulk-checkin
>>    infrastructure, as suggested by avarab.
>>
>>  * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
>>    by dscho and avarab and others.
>>
>>  * Add more details to Documentation/config/core.txt about the various
>>    settings and their intended effects, as suggested by avarab.
>>
>>  * Switch to the string-list API to hold the rename state, as suggested by
>>    avarab.
>>
>>  * Create a separate update-index patch to use bulk-checkin as suggested by
>>    dscho.
>>
>>  * Add Windows support in the upstream git. This is done in a way that
>>    should not conflict with git-for-windows.
>>
>>  * Add new performance tests that shows the delta based on fsync mode.
>>
>> NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
>> on Linux, since sync_file_range does not provide data integrity guarantees.
>> There is currently no kernel interface suitable to achieve disk flush
>> batching as is, but he suggested that he might implement a 'syncfs' variant
>> on top of this patchset. This code is still useful on macOS and Windows, and
>> the config documentation makes that clear.
>>
>> Neeraj Singh (6):
>>   object-file: use futimens rather than utime
>>   bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>>   core.fsyncobjectfiles: batched disk flushes
>>   core.fsyncobjectfiles: add windows support for batch mode
>>   update-index: use the bulk-checkin infrastructure
>>   core.fsyncobjectfiles: performance tests for add and stash
>>
>>  Documentation/config/core.txt       | 26 ++++++--
>>  Makefile                            |  6 ++
>>  builtin/add.c                       |  3 +-
>>  builtin/update-index.c              |  3 +
>>  bulk-checkin.c                      | 92 +++++++++++++++++++++++++----
>>  bulk-checkin.h                      |  4 +-
>>  cache.h                             |  8 ++-
>>  compat/mingw.c                      | 53 +++++++++++------
>>  compat/mingw.h                      |  5 ++
>>  compat/win32/flush.c                | 29 +++++++++
>>  config.c                            |  8 ++-
>>  config.mak.uname                    |  4 ++
>>  configure.ac                        |  8 +++
>>  contrib/buildsystems/CMakeLists.txt |  3 +-
>>  environment.c                       |  2 +-
>>  git-compat-util.h                   |  7 +++
>>  object-file.c                       | 23 ++------
>>  t/perf/lib-unique-files.sh          | 32 ++++++++++
>>  t/perf/p3700-add.sh                 | 43 ++++++++++++++
>>  t/perf/p3900-stash.sh               | 46 +++++++++++++++
>>  wrapper.c                           | 40 +++++++++++++
>>  write-or-die.c                      |  2 +-
>>  22 files changed, 389 insertions(+), 58 deletions(-)
>>  create mode 100644 compat/win32/flush.c
>>  create mode 100644 t/perf/lib-unique-files.sh
>>  create mode 100755 t/perf/p3700-add.sh
>>  create mode 100755 t/perf/p3900-stash.sh
>>
>>
>> base-commit: 225bc32a989d7a22fa6addafd4ce7dcd04675dbf
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v2
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v2
>> Pull-Request: https://github.com/git/git/pull/1076
>
> Hello everyone,
> I'd like to bump this review up in people's inboxes since Patch V2
> hasn't gotten any traction in over a week.
>
> Thanks in advance for taking a look,
> - Neeraj Singh
> Windows Core Filesystems Team

Thanks, I've been meaning to take a look at this, and also as a
note-to-self: check how this interacts with the fsync()-impacted race I
noted in my just-sent:
https://lore.kernel.org/git/cover-0.3-00000000000-20210907T193600Z-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
  2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
@ 2021-09-07 19:54     ` Randall S. Becker
  2021-09-08  0:54       ` Neeraj Singh
  2021-09-08  0:55     ` Neeraj Singh
  2 siblings, 1 reply; 160+ messages in thread
From: Randall S. Becker @ 2021-09-07 19:54 UTC (permalink / raw)
  To: 'Neeraj Singh', 'Neeraj K. Singh via GitGitGadget'
  Cc: 'Git List', 'Johannes Schindelin',
	'Jeff King', 'Jeff Hostetler',
	'Christoph Hellwig',
	'Ævar Arnfjörð Bjarmason',
	'Neeraj K. Singh'

On September 7, 2021 3:44 PM, Neeraj Singh wrote:
>On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>
>> Thanks to everyone for review so far! I've responded to the previous
>> feedback and changed the patch series a bit.
>>
>> Changes since v1:
>>
>>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>>    to dscho's suggestion, I'm still implementing the Windows version in the
>>    same patch and I'm not doing autoconf detection since this is a POSIX
>>    function.

While POSIX.1-2008, this function is not available on every single POSIX-compliant platform. Please make sure that the code will not cause a breakage on some platforms - the ones I maintain, in particular. Neither futimes nor futimens is available on either NonStop ia64 or x86. The platform only has utime, so this needs to be wrapped with an option in config.mak.uname.

Thanks,
Randall



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-07 19:54     ` Randall S. Becker
@ 2021-09-08  0:54       ` Neeraj Singh
  2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-08  0:54 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
<rsbecker@nexbridge.com> wrote:
>
> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >>
> >> Thanks to everyone for review so far! I've responded to the previous
> >> feedback and changed the patch series a bit.
> >>
> >> Changes since v1:
> >>
> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
> >>    to dscho's suggestion, I'm still implementing the Windows version in the
> >>    same patch and I'm not doing autoconf detection since this is a POSIX
> >>    function.
>
> While POSIX.1-2008, this function is not available on every single POSIX-compliant platform. Please make sure that the code will not cause a breakage on some platforms - the ones I maintain, in particular. Neither futimes nor futimens is available on either NonStop ia64 or x86. The platform only has utime, so this needs to be wrapped with an option in config.mak.uname.
>
> Thanks,
> Randall

Ugh. Fair enough.  How do other contributors feel about me moving back
to utime, but instead just doing the utime over in
builtins/pack-objects.c?  The idea would be to eliminate the mtime
logic entirely from write_loose_object and just do it at the top-level
in loosen_unused_packed_objects.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
  2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
  2021-09-07 19:54     ` Randall S. Becker
@ 2021-09-08  0:55     ` Neeraj Singh
  2021-09-08  6:44       ` Junio C Hamano
  2 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-08  0:55 UTC (permalink / raw)
  To: Neeraj K. Singh via GitGitGadget
  Cc: Git List, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh

On Tue, Sep 7, 2021 at 12:44 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> Hello everyone,
> I'd like to bump this review up in people's inboxes since Patch V2
> hasn't gotten any traction in over a week.
>
> Thanks in advance for taking a look,
> - Neeraj Singh
> Windows Core Filesystems Team

BTW, I updated the github PR to enable batch mode everywhere, and all
the tests passed, which is good news to me.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  0:54       ` Neeraj Singh
@ 2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
  2021-09-08 14:04           ` Randall S. Becker
  2021-09-08 19:01           ` Neeraj Singh
  0 siblings, 2 replies; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08  1:22 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Randall S. Becker, Neeraj K. Singh via GitGitGadget, Git List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Neeraj K. Singh


On Tue, Sep 07 2021, Neeraj Singh wrote:

> On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
> <rsbecker@nexbridge.com> wrote:
>>
>> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
>> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> >>
>> >> Thanks to everyone for review so far! I've responded to the previous
>> >> feedback and changed the patch series a bit.
>> >>
>> >> Changes since v1:
>> >>
>> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>> >>    to dscho's suggestion, I'm still implementing the Windows version in the
>> >>    same patch and I'm not doing autoconf detection since this is a POSIX
>> >>    function.
>>
>> While POSIX.1-2008, this function is not available on every single
>> POSIX-compliant platform. Please make sure that the code will not
>> cause a breakage on some platforms - the ones I maintain, in
>> particular. Neither futimes nor futimens is available on either
>> NonStop ia64 or x86. The platform only has utime, so this needs to
>> be wrapped with an option in config.mak.uname.
>>
>> Thanks,
>> Randall
>
> Ugh. Fair enough.  How do other contributors feel about me moving back
> to utime, but instead just doing the utime over in
> builtins/pack-objects.c?  The idea would be to eliminate the mtime
> logic entirely from write_loose_object and just do it at the top-level
> in loosen_unused_packed_objects.

Aside from where it lives, can't we just have a wrapper that takes both
the filename & fd, and then on some platforms will need to dispatch to a
slower filename-only version, but can hopefully use the new fd-accepting
function?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  0:55     ` Neeraj Singh
@ 2021-09-08  6:44       ` Junio C Hamano
  2021-09-08  6:49         ` Christoph Hellwig
  2021-09-08 16:34         ` Neeraj Singh
  0 siblings, 2 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-09-08  6:44 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

Neeraj Singh <nksingh85@gmail.com> writes:

> BTW, I updated the github PR to enable batch mode everywhere, and all
> the tests passed, which is good news to me.

I doubt that fsyncObjectFiles is something we can reliably test in
CI, either with the new batched thing or with the original "when we
close one, make sure the changes hit the disk platter" approach.  So
I am not sure what conclusion we should draw from such an experiment,
other than "ok, it compiles cleanly."  After all, unless we cause
system crashes, what we thought we have written and close(2) would
be seen by another process that we spawn after that, with or without
sync, no?




^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  6:44       ` Junio C Hamano
@ 2021-09-08  6:49         ` Christoph Hellwig
  2021-09-08 13:57           ` Randall S. Becker
  2021-09-08 16:34         ` Neeraj Singh
  1 sibling, 1 reply; 160+ messages in thread
From: Christoph Hellwig @ 2021-09-08  6:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj Singh, Neeraj K. Singh via GitGitGadget, Git List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh

On Tue, Sep 07, 2021 at 11:44:52PM -0700, Junio C Hamano wrote:
> I doubt that fsyncObjectFiles is something we can reliably test in
> CI, either with the new batched thing or with the original "when we
> close one, make sure the changes hit the disk platter" approach.  So
> I am not sure what conclusion we should draw from such an experiment,
> other than "ok, it compiles cleanly."  After all, unless we cause
> system crashes, what we thought we have written and close(2) would
> be seen by another process that we spawn after that, with or without
> sync, no?

Basically yes.  XFS on Linux has shutdown ioctls that allow to simulate
that crash by shutting the file system down which really helps debugging
that kind of code.  A bunch of other file systems (ext4, f2fs) have
also picked this up now (grep for {XFS,EXT4,F2FS}_IOC_SHUTDOWN).

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  6:49         ` Christoph Hellwig
@ 2021-09-08 13:57           ` Randall S. Becker
  2021-09-08 14:13             ` 'Christoph Hellwig'
  0 siblings, 1 reply; 160+ messages in thread
From: Randall S. Becker @ 2021-09-08 13:57 UTC (permalink / raw)
  To: 'Christoph Hellwig', 'Junio C Hamano'
  Cc: 'Neeraj Singh',
	'Neeraj K. Singh via GitGitGadget', 'Git List',
	'Johannes Schindelin', 'Jeff King',
	'Jeff Hostetler',
	'Ævar Arnfjörð Bjarmason',
	'Neeraj K. Singh'

On September 8, 2021 2:50 AM, Christoph Hellwig wrote:
>To: Junio C Hamano <gitster@pobox.com>
>Cc: Neeraj Singh <nksingh85@gmail.com>; Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com>; Git List <git@vger.kernel.org>;
>Johannes Schindelin <Johannes.Schindelin@gmx.de>; Jeff King <peff@peff.net>; Jeff Hostetler <jeffhost@microsoft.com>; Christoph
>Hellwig <hch@lst.de>; Ævar Arnfjörð Bjarmason <avarab@gmail.com>; Neeraj K. Singh <neerajsi@microsoft.com>
>Subject: Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
>
>On Tue, Sep 07, 2021 at 11:44:52PM -0700, Junio C Hamano wrote:
>> I doubt that fsyncObjectFiles is something we can reliably test in CI,
>> either with the new batched thing or with the original "when we close
>> one, make sure the changes hit the disk platter" approach.  So I am
>> not sure what conclusion we should draw from such an experiment, other
>> than "ok, it compiles cleanly."  After all, unless we cause system
>> crashes, what we thought we have written and close(2) would be seen by
>> another process that we spawn after that, with or without sync, no?
>
>Basically yes.  XFS on Linux has shutdown ioctls that allow to simulate that crash by shutting the file system down which really
helps
>debugging that kind of code.  A bunch of other file systems (ext4, f2fs) have also picked this up now (grep for
>{XFS,EXT4,F2FS}_IOC_SHUTDOWN).

I strongly doubt this concept will work in an MPP architecture, particularly one where "shutting the file system down" is not
possible. I know of at least 3 operating systems where that is a bad plan, and if you did, you would take the test suite down while
you were at it.
-Randall


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
@ 2021-09-08 14:04           ` Randall S. Becker
  2021-09-08 19:01           ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Randall S. Becker @ 2021-09-08 14:04 UTC (permalink / raw)
  To: 'Ævar Arnfjörð Bjarmason', 'Neeraj Singh'
  Cc: 'Neeraj K. Singh via GitGitGadget', 'Git List',
	'Johannes Schindelin', 'Jeff King',
	'Jeff Hostetler', 'Christoph Hellwig',
	'Neeraj K. Singh'

On September 7, 2021 9:23 PM, Ævar Arnfjörð Bjarmason wrote:
>Subject: Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
>
>
>On Tue, Sep 07 2021, Neeraj Singh wrote:
>
>> On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
>> <rsbecker@nexbridge.com> wrote:
>>>
>>> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
>>> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>> >>
>>> >> Thanks to everyone for review so far! I've responded to the
>>> >> previous feedback and changed the patch series a bit.
>>> >>
>>> >> Changes since v1:
>>> >>
>>> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
>>> >>    to dscho's suggestion, I'm still implementing the Windows version in the
>>> >>    same patch and I'm not doing autoconf detection since this is a POSIX
>>> >>    function.
>>>
>>> While POSIX.1-2008, this function is not available on every single
>>> POSIX-compliant platform. Please make sure that the code will not
>>> cause a breakage on some platforms - the ones I maintain, in
>>> particular. Neither futimes nor futimens is available on either
>>> NonStop ia64 or x86. The platform only has utime, so this needs to be
>>> wrapped with an option in config.mak.uname.
>>>
>>> Thanks,
>>> Randall
>>
>> Ugh. Fair enough.  How do other contributors feel about me moving back
>> to utime, but instead just doing the utime over in
>> builtins/pack-objects.c?  The idea would be to eliminate the mtime
>> logic entirely from write_loose_object and just do it at the top-level
>> in loosen_unused_packed_objects.
>
>Aside from where it lives, can't we just have a wrapper that takes both the filename & fd, and then on some platforms will need to
>dispatch to a slower filename-only version, but can hopefully use the new fd-accepting function?

I'm not really enamoured with this direction at all. It means that any platform would have to potentially skip a version of git (resulting from the broken build from a wrapper that is not compilable) after the patches were applied, unless the patches for all of those platforms are included. Even adding a Makefile option would be similar. This should be an "Enable if supported" feature, not a default-or-broken, feature. At best, I'd have to monitor for the time where the patch is applied and hope I can figure out the wrapper changes (around my $DAYJOB) in time to make the same release. This seems a bit counter to a "keeping things compatible" philosophy. Maybe there's something I'm missing here.
-Randall


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08 13:57           ` Randall S. Becker
@ 2021-09-08 14:13             ` 'Christoph Hellwig'
  2021-09-08 14:25               ` Randall S. Becker
  0 siblings, 1 reply; 160+ messages in thread
From: 'Christoph Hellwig' @ 2021-09-08 14:13 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: 'Christoph Hellwig', 'Junio C Hamano',
	'Neeraj Singh',
	'Neeraj K. Singh via GitGitGadget', 'Git List',
	'Johannes Schindelin', 'Jeff King',
	'Jeff Hostetler',
	'Ævar Arnfjörð Bjarmason',
	'Neeraj K. Singh'

On Wed, Sep 08, 2021 at 09:57:34AM -0400, Randall S. Becker wrote:
> possible. I know of at least 3 operating systems where that is a bad plan, and if you did, you would take the test suite down while
> you were at it.

I've just mentioned a good way to write a test for this feature on a
specific platform.  This is absolutely no judgement if that is a good
plan on other platforms.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08 14:13             ` 'Christoph Hellwig'
@ 2021-09-08 14:25               ` Randall S. Becker
  0 siblings, 0 replies; 160+ messages in thread
From: Randall S. Becker @ 2021-09-08 14:25 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: 'Junio C Hamano', 'Neeraj Singh',
	'Neeraj K. Singh via GitGitGadget', 'Git List',
	'Johannes Schindelin', 'Jeff King',
	'Jeff Hostetler',
	'Ævar Arnfjörð Bjarmason',
	'Neeraj K. Singh'

On September 8, 2021 10:13 AM, Christoph Hellwig wrote:
>Subject: Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
>
>On Wed, Sep 08, 2021 at 09:57:34AM -0400, Randall S. Becker wrote:
>> possible. I know of at least 3 operating systems where that is a bad
>> plan, and if you did, you would take the test suite down while you were at it.
>
>I've just mentioned a good way to write a test for this feature on a specific platform.  This is absolutely no judgement if that is
a good plan
>on other platforms.

Thank you for the clarification. I do appreciate it.
-Randall


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  6:44       ` Junio C Hamano
  2021-09-08  6:49         ` Christoph Hellwig
@ 2021-09-08 16:34         ` Neeraj Singh
  2021-09-08 19:12           ` Junio C Hamano
  2021-09-08 19:23           ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-08 16:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> > BTW, I updated the github PR to enable batch mode everywhere, and all
> > the tests passed, which is good news to me.
>
> I doubt that fsyncObjectFiles is something we can reliably test in
> CI, either with the new batched thing or with the original "when we
> close one, make sure the changes hit the disk platter" approach.  So
> I am not sure what conclusion we should draw from such an experiment,
> other than "ok, it compiles cleanly."  After all, unless we cause
> system crashes, what we thought we have written and close(2) would
> be seen by another process that we spawn after that, with or without
> sync, no?

The main failure mode I was worried about is that some test or other part
of Git is relying on a loose object being immediately available after it is
added to the ODB. With batch mode, the loose objects aren't actually
available until the bulk checkin is unplugged.

I agree that it is not easy to test whether the data is actually going
to durable
storage at the expected time.  FWIW, I did take a disk IO trace on Windows to
verify that we are issuing disk writes and flushes at the right time.
But that's a
one-time test that would be hard to make automated.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
  2021-09-08 14:04           ` Randall S. Becker
@ 2021-09-08 19:01           ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-08 19:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Randall S. Becker, Neeraj K. Singh via GitGitGadget, Git List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Neeraj K. Singh

On Tue, Sep 7, 2021 at 6:23 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Tue, Sep 07 2021, Neeraj Singh wrote:
>
> > On Tue, Sep 7, 2021 at 12:54 PM Randall S. Becker
> > <rsbecker@nexbridge.com> wrote:
> >>
> >> On September 7, 2021 3:44 PM, Neeraj Singh wrote:
> >> >On Fri, Aug 27, 2021 at 4:49 PM Neeraj K. Singh via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >> >>
> >> >> Thanks to everyone for review so far! I've responded to the previous
> >> >> feedback and changed the patch series a bit.
> >> >>
> >> >> Changes since v1:
> >> >>
> >> >>  * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
> >> >>    to dscho's suggestion, I'm still implementing the Windows version in the
> >> >>    same patch and I'm not doing autoconf detection since this is a POSIX
> >> >>    function.
> >>
> >> While POSIX.1-2008, this function is not available on every single
> >> POSIX-compliant platform. Please make sure that the code will not
> >> cause a breakage on some platforms - the ones I maintain, in
> >> particular. Neither futimes nor futimens is available on either
> >> NonStop ia64 or x86. The platform only has utime, so this needs to
> >> be wrapped with an option in config.mak.uname.
> >>
> >> Thanks,
> >> Randall
> >
> > Ugh. Fair enough.  How do other contributors feel about me moving back
> > to utime, but instead just doing the utime over in
> > builtins/pack-objects.c?  The idea would be to eliminate the mtime
> > logic entirely from write_loose_object and just do it at the top-level
> > in loosen_unused_packed_objects.
>
> Aside from where it lives, can't we just have a wrapper that takes both
> the filename & fd, and then on some platforms will need to dispatch to a
> slower filename-only version, but can hopefully use the new fd-accepting
> function?

I had some concerns around using utime() while a file descriptor is open.
There's some risk of sharing violation on Windows (doesn't matter since we'd
be using futimens), but I was also concerned that there might be some OSes that
update the mtime on close(fd), thus overwriting the effects of utime.
Maybe that's an unwarranted concern, but it's part of why I didn't want to have
different call sequences on different OSes.

I'd be happy to implement your suggestion though and see what happens. But I
also feel that this time update thing is pretty ancillary to the real
goal of my change.
I'm only doing it because it's in the same area. The effects of
getting mtime wrong
would be pretty subtle -- I think we'd just not be deleting some
unpacked unreachable
objects as soon as expected.  Do you have a strong objection to
lifting the time update
logic out?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08 16:34         ` Neeraj Singh
@ 2021-09-08 19:12           ` Junio C Hamano
  2021-09-08 19:20             ` Neeraj Singh
  2021-09-08 19:23           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 160+ messages in thread
From: Junio C Hamano @ 2021-09-08 19:12 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

Neeraj Singh <nksingh85@gmail.com> writes:

> On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Neeraj Singh <nksingh85@gmail.com> writes:
>>
>> > BTW, I updated the github PR to enable batch mode everywhere, and all
>> > the tests passed, which is good news to me.
>>
>> I doubt that fsyncObjectFiles is something we can reliably test in
>> CI, either with the new batched thing or with the original "when we
>> close one, make sure the changes hit the disk platter" approach.  So
>> I am not sure what conclusion we should draw from such an experiment,
>> other than "ok, it compiles cleanly."  After all, unless we cause
>> system crashes, what we thought we have written and close(2) would
>> be seen by another process that we spawn after that, with or without
>> sync, no?
>
> The main failure mode I was worried about is that some test or other part
> of Git is relying on a loose object being immediately available after it is
> added to the ODB. With batch mode, the loose objects aren't actually
> available until the bulk checkin is unplugged.

Ah, I see.  If there are two processes that communicate over pipes
to decide whose turn it is (perhaps a producer of data that feeds
fast-import may wait for fast-import to say "I gave this label to
the object you requested" and goes ahead to use that object), and at
the point that the "other" process takes its turn, if the objects
are not "flushed" yet, things can break.  That's a valid concern.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08 19:12           ` Junio C Hamano
@ 2021-09-08 19:20             ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-08 19:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Neeraj K. Singh

On Wed, Sep 8, 2021 at 12:12 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Neeraj Singh <nksingh85@gmail.com> writes:
>
> > On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Neeraj Singh <nksingh85@gmail.com> writes:
> >>
> >> > BTW, I updated the github PR to enable batch mode everywhere, and all
> >> > the tests passed, which is good news to me.
> >>
> >> I doubt that fsyncObjectFiles is something we can reliably test in
> >> CI, either with the new batched thing or with the original "when we
> >> close one, make sure the changes hit the disk platter" approach.  So
> >> I am not sure what conclusion we should draw from such an experiment,
> >> other than "ok, it compiles cleanly."  After all, unless we cause
> >> system crashes, what we thought we have written and close(2) would
> >> be seen by another process that we spawn after that, with or without
> >> sync, no?
> >
> > The main failure mode I was worried about is that some test or other part
> > of Git is relying on a loose object being immediately available after it is
> > added to the ODB. With batch mode, the loose objects aren't actually
> > available until the bulk checkin is unplugged.
>
> Ah, I see.  If there are two processes that communicate over pipes
> to decide whose turn it is (perhaps a producer of data that feeds
> fast-import may wait for fast-import to say "I gave this label to
> the object you requested" and goes ahead to use that object), and at
> the point that the "other" process takes its turn, if the objects
> are not "flushed" yet, things can break.  That's a valid concern.

That's right. This appears to be a possibility in the existing bulk
checkin code that produces packfiles for large objects as well, but
my change makes the situation much more common.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-08 16:34         ` Neeraj Singh
  2021-09-08 19:12           ` Junio C Hamano
@ 2021-09-08 19:23           ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-08 19:23 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Junio C Hamano, Neeraj K. Singh via GitGitGadget, Git List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Neeraj K. Singh


On Wed, Sep 08 2021, Neeraj Singh wrote:

> On Tue, Sep 7, 2021 at 11:44 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Neeraj Singh <nksingh85@gmail.com> writes:
>>
>> > BTW, I updated the github PR to enable batch mode everywhere, and all
>> > the tests passed, which is good news to me.
>>
>> I doubt that fsyncObjectFiles is something we can reliably test in
>> CI, either with the new batched thing or with the original "when we
>> close one, make sure the changes hit the disk platter" approach.  So
>> I am not sure what conclusion we should draw from such an experiment,
>> other than "ok, it compiles cleanly."  After all, unless we cause
>> system crashes, what we thought we have written and close(2) would
>> be seen by another process that we spawn after that, with or without
>> sync, no?
>
> The main failure mode I was worried about is that some test or other part
> of Git is relying on a loose object being immediately available after it is
> added to the ODB. With batch mode, the loose objects aren't actually
> available until the bulk checkin is unplugged.
>
> I agree that it is not easy to test whether the data is actually going
> to durable
> storage at the expected time.  FWIW, I did take a disk IO trace on Windows to
> verify that we are issuing disk writes and flushes at the right time.
> But that's a
> one-time test that would be hard to make automated.

I have some semi-related patches I need to dig up and finish sometime
which add a "git gc" test mode to the test suite, i.e. any time we call
"git gc --auto" it will go ahead and actually run, and some adversarial
options to run always, right away, prune with --expire=now. It found
some false positives, but also some genuine races and bugs at the time.

Similarly, I think a good longer term goal for better fsync() and data
integrity in git is to refactor the various codepaths where we write to
disk (grepping for fsync_or_die() is a good start to find those) to all
live in one place, we could then easily instrument that code to run in a
hostile test mode.

E.g. make anything that expects to write out a "foo" file actually write
out "foo.not-synced-yet" as long as fsync() etc. hasn't been called, or
with signals/timers/atexit() handlers fake up known FS edge cases such
as a write of "foo" only renaming "foo.not-synced-yet" to "foo" 1s after
the last close() call not followed by an fsync, etc.

Anyway, I expect given your occupation that you may have better ideas in
that area, presumably needing to instrument and test behavior under I/O
pressure, deferred syncs etc. is something mature FS's need to deal with
as part of their own regression tests...

1. https://lore.kernel.org/git/cover-v2-0.4-0000000000-20210908T003631Z-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
@ 2021-09-14  3:38   ` Neeraj K. Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                       ` (7 more replies)
  7 siblings, 8 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh

Thanks to everyone for review so far!

Changes since v2:

 * Removed an unused Makefile define (FSYNC_DOESNT_FLUSH) that slipped in
   from an intermediate change.

 * Drop the futimens part of the patch and return to just calling utime, now
   within the new bulk_checkin code. The utime to futimens change seemed to
   be problematic for some platforms (thanks Randall Becker), and is really
   orthogonal to the rest of the patch series.

 * (Optional commit) Enable batch mode by default so that we can shake loose
   any issues relating to deferring the renames until the
   unplug_bulk_checkin.

Changes since v1:

 * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
   to dscho's suggestion, I'm still implementing the Windows version in the
   same patch and I'm not doing autoconf detection since this is a POSIX
   function.

 * Introduce a separate preparatory patch to the bulk-checkin infrastructure
   to separate the 'plugged' variable and rename the 'state' variable, as
   suggested by dscho.

 * Add performance numbers to the commit message of the main bulk fsync
   patch, as suggested by dscho.

 * Add a comment about the non-thread-safety of the bulk-checkin
   infrastructure, as suggested by avarab.

 * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
   by dscho and avarab and others.

 * Add more details to Documentation/config/core.txt about the various
   settings and their intended effects, as suggested by avarab.

 * Switch to the string-list API to hold the rename state, as suggested by
   avarab.

 * Create a separate update-index patch to use bulk-checkin as suggested by
   dscho.

 * Add Windows support in the upstream git. This is done in a way that
   should not conflict with git-for-windows.

 * Add new performance tests that shows the delta based on fsync mode.

NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
on Linux, since sync_file_range does not provide data integrity guarantees.
There is currently no kernel interface suitable to achieve disk flush
batching as is, but he suggested that he might implement a 'syncfs' variant
on top of this patchset. This code is still useful on macOS and Windows, and
the config documentation makes that clear.

Neeraj Singh (6):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: performance tests for add and stash
  core.fsyncobjectfiles: enable batch mode for testing

 Documentation/config/core.txt       |  26 +++++--
 Makefile                            |   6 ++
 builtin/add.c                       |   3 +-
 builtin/update-index.c              |   3 +
 bulk-checkin.c                      | 103 +++++++++++++++++++++++++---
 bulk-checkin.h                      |   5 +-
 cache.h                             |   8 ++-
 compat/mingw.h                      |   3 +
 compat/win32/flush.c                |  29 ++++++++
 config.c                            |   8 ++-
 config.mak.uname                    |   3 +
 configure.ac                        |   8 +++
 contrib/buildsystems/CMakeLists.txt |   3 +-
 environment.c                       |   2 +-
 git-compat-util.h                   |   7 ++
 object-file.c                       |  22 +-----
 t/perf/lib-unique-files.sh          |  32 +++++++++
 t/perf/p3700-add.sh                 |  43 ++++++++++++
 t/perf/p3900-stash.sh               |  46 +++++++++++++
 wrapper.c                           |  40 +++++++++++
 write-or-die.c                      |   2 +-
 21 files changed, 358 insertions(+), 44 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/perf/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v3
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v2:

 1:  fc3d5a7b635 < -:  ----------- object-file: use futimens rather than utime
 2:  49f72800bfb = 1:  d5893e28df1 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 3:  2c1c907b12a ! 2:  f8b5b709e9e core.fsyncobjectfiles: batched disk flushes
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
      +}
      +
      +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
     -+					      const char *filename)
     ++					      const char *filename, time_t mtime)
      +{
     ++	int do_finalize = 1;
     ++	int ret = 0;
     ++
      +	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
      +		/*
      +		 * If we have a plugged bulk checkin, we issue a call that
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
      +		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
      +		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
      +			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
     -+			if (close(fd))
     -+				die_errno(_("error when closing loose object file"));
     -+
     -+			return 0;
     ++			do_finalize = 0;
      +
      +		} else {
      +			fsync_or_die(fd, "loose object file");
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
      +	if (close(fd))
      +		die_errno(_("error when closing loose object file"));
      +
     -+	return finalize_object_file(tmpfile, filename);
     ++	if (mtime) {
     ++		struct utimbuf utb;
     ++		utb.actime = mtime;
     ++		utb.modtime = mtime;
     ++		if (utime(tmpfile, &utb) < 0)
     ++			warning_errno(_("failed utime() on %s"), tmpfile);
     ++	}
     ++
     ++	if (do_finalize)
     ++		ret = finalize_object_file(tmpfile, filename);
     ++
     ++	return ret;
      +}
      +
       int index_bulk_checkin(struct object_id *oid,
     @@ bulk-checkin.h
       
       #include "cache.h"
       
     -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile, const char *filename);
     ++int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
     ++					      const char *filename, time_t mtime);
      +
       int index_bulk_checkin(struct object_id *oid,
       		       int fd, size_t size, enum object_type type,
     @@ config.mak.uname: ifeq ($(uname_S),Linux)
       	HAVE_GETDELIM = YesPlease
       	SANE_TEXT_GREP=-a
       	FREAD_READS_DIRECTORIES = UnfortunatelyYes
     -@@ config.mak.uname: ifeq ($(uname_S),Darwin)
     - 	COMPAT_OBJS += compat/precompose_utf8.o
     - 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
     - 	BASIC_CFLAGS += -DPROTECT_HFS_DEFAULT=1
     -+	BASIC_CFLAGS += -DFSYNC_DOESNT_FLUSH=1
     - 	HAVE_BSD_SYSCTL = YesPlease
     - 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
     - 	HAVE_NS_GET_EXECUTABLE_PATH = YesPlease
      
       ## configure.ac ##
      @@ configure.ac: AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
     @@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void
       }
       
      -/* Finalize a file on disk, and close it. */
     --static int close_loose_object(int fd, const char *tmpfile, const char *filename)
     +-static void close_loose_object(int fd)
      -{
      -	if (fsync_object_files)
      -		fsync_or_die(fd, "loose object file");
      -	if (close(fd) != 0)
      -		die_errno(_("error when closing loose object file"));
     --	return finalize_object_file(tmpfile, filename);
      -}
      -
       /* Size of directory component, including the ending '/' */
       static inline int directory_size(const char *filename)
       {
      @@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
     - 			warning_errno(_("failed futimes() on %s"), tmp_file.buf);
     - 	}
     + 		die(_("confused by unstable object source data for %s"),
     + 		    oid_to_hex(oid));
       
     --	return close_loose_object(fd, tmp_file.buf, filename.buf);
     -+	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf, filename.buf);
     +-	close_loose_object(fd);
     +-
     +-	if (mtime) {
     +-		struct utimbuf utb;
     +-		utb.actime = mtime;
     +-		utb.modtime = mtime;
     +-		if (utime(tmp_file.buf, &utb) < 0)
     +-			warning_errno(_("failed utime() on %s"), tmp_file.buf);
     +-	}
     +-
     +-	return finalize_object_file(tmp_file.buf, filename.buf);
     ++	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
     ++							 filename.buf, mtime);
       }
       
       static int freshen_loose_object(const struct object_id *oid)
 4:  546ad9c82e8 = 3:  815a862e229 core.fsyncobjectfiles: add windows support for batch mode
 5:  d8843185fe4 = 4:  6b576038986 update-index: use the bulk-checkin infrastructure
 6:  73b5d41be94 = 5:  b7ca3ba9302 core.fsyncobjectfiles: performance tests for add and stash
 -:  ----------- > 6:  55a40fc8fd5 core.fsyncobjectfiles: enable batch mode for testing

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
@ 2021-09-14  3:38     ` Neeraj Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index b023d9959aa..f117d62c908 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
@@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-09-14  3:38     ` Neeraj Singh via GitGitGadget
  2021-09-14 10:39       ` Bagas Sanjaya
  2021-09-14 19:34       ` Junio C Hamano
  2021-09-14  3:38     ` [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                       ` (5 subsequent siblings)
  7 siblings, 2 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
macOS, and Linux each offer mechanisms to write data from the filesystem
page cache without initiating a hardware flush.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
takes advantage of the bulk-checkin infrastructure to batch up hardware
flushes.

When the new mode is enabled we do the following for new objects:

1. Create a tmp_obj_XXXX file and write the object data to it.
2. Issue a pagecache writeback request and wait for it to complete.
3. Record the tmp name and the final name in the bulk-checkin state for
   later rename.

At the end of the entire transaction we:
1. Issue a fsync against the lock file to flush the hardware writeback
   cache, which should by now have processed the tmp file writes.
2. Rename all of the temp files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 26 ++++++++---
 Makefile                      |  6 +++
 builtin/add.c                 |  3 +-
 bulk-checkin.c                | 81 ++++++++++++++++++++++++++++++++++-
 bulk-checkin.h                |  5 ++-
 cache.h                       |  8 +++-
 config.c                      |  8 +++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 ++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 +++
 object-file.c                 | 22 +---------
 wrapper.c                     | 36 ++++++++++++++++
 write-or-die.c                |  2 +-
 14 files changed, 182 insertions(+), 33 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..0006d90980d 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,26 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls the object store, so updates to any refs or the
+	index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write object data with a minimal set of FLUSH CACHE
+  (or equivalent) commands sent to the storage controller. If the operating
+  system interfaces are not available, this mode behaves the same as `true`.
+  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
+  filesystems and on Windows for repos stored on NTFS or ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 429c276058d..326c7607e0f 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/builtin/add.c b/builtin/add.c
index 2244311d485..dda4bf093a0 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -678,7 +678,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (chmod_arg && pathspec.nr)
 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
-	unplug_bulk_checkin();
+
+	unplug_bulk_checkin(&lock_file);
 
 finish:
 	if (write_locked_index(&the_index, &lock_file,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index f117d62c908..ddbab5e5c8c 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,15 +3,19 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
 
+static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
+
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
@@ -62,6 +66,32 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
+{
+	if (fsync_state->nr) {
+		struct string_list_item *rename;
+
+		/*
+		 * Issue a full hardware flush against the lock file to ensure
+		 * that all objects are durable before any renames occur.
+		 * The code in fsync_and_close_loose_object_bulk_checkin has
+		 * already ensured that writeout has occurred, but it has not
+		 * flushed any writeback cache in the storage hardware.
+		 */
+		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
+
+		for_each_string_list_item(rename, fsync_state) {
+			const char *src = rename->string;
+			const char *dst = rename->util;
+
+			if (finalize_object_file(src, dst))
+				die_errno(_("could not rename '%s' to '%s'"), src, dst);
+		}
+
+		string_list_clear(fsync_state, 1);
+	}
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+static void add_rename_bulk_checkin(struct string_list *fsync_state,
+				    const char *src, const char *dst)
+{
+	string_list_insert(fsync_state, src)->util = xstrdup(dst);
+}
+
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
+					      const char *filename, time_t mtime)
+{
+	int do_finalize = 1;
+	int ret = 0;
+
+	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
+		/*
+		 * If we have a plugged bulk checkin, we issue a call that
+		 * cleans the filesystem page cache but avoids a hardware flush
+		 * command. Later on we will issue a single hardware flush
+		 * before renaming files as part of do_sync_and_rename.
+		 */
+		if (bulk_checkin_plugged &&
+		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
+		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
+			do_finalize = 0;
+
+		} else {
+			fsync_or_die(fd, "loose object file");
+		}
+	}
+
+	if (close(fd))
+		die_errno(_("error when closing loose object file"));
+
+	if (mtime) {
+		struct utimbuf utb;
+		utb.actime = mtime;
+		utb.modtime = mtime;
+		if (utime(tmpfile, &utb) < 0)
+			warning_errno(_("failed utime() on %s"), tmpfile);
+	}
+
+	if (do_finalize)
+		ret = finalize_object_file(tmpfile, filename);
+
+	return ret;
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -273,10 +350,12 @@ void plug_bulk_checkin(void)
 	bulk_checkin_plugged = 1;
 }
 
-void unplug_bulk_checkin(void)
+void unplug_bulk_checkin(struct lock_file *lock_file)
 {
 	assert(bulk_checkin_plugged);
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_sync_and_rename(&bulk_fsync_state, lock_file);
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..4a3309c1531 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,11 +6,14 @@
 
 #include "cache.h"
 
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
+					      const char *filename, time_t mtime);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
 
 void plug_bulk_checkin(void);
-void unplug_bulk_checkin(void);
+void unplug_bulk_checkin(struct lock_file *);
 
 #endif
diff --git a/cache.h b/cache.h
index d23de693680..39b3a88181a 100644
--- a/cache.h
+++ b/cache.h
@@ -985,7 +985,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index cb4a8058bff..9fe3602e1c4 100644
--- a/config.c
+++ b/config.c
@@ -1509,7 +1509,13 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (!value)
+			return config_error_nonbool(var);
+		if (!strcasecmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else
+			fsync_object_files = git_config_bool(var, value)
+				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..e6d482fbcc6 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index d6b22ede7ea..3e23eafff80 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index b46605300ab..d14e2436276 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index a8be8994814..ea14c3a3483 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 	return 0;
 }
 
-/* Finalize a file on disk, and close it. */
-static void close_loose_object(int fd)
-{
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
-	if (close(fd) != 0)
-		die_errno(_("error when closing loose object file"));
-}
-
 /* Size of directory component, including the ending '/' */
 static inline int directory_size(const char *filename)
 {
@@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
 
-	close_loose_object(fd);
-
-	if (mtime) {
-		struct utimbuf utb;
-		utb.actime = mtime;
-		utb.modtime = mtime;
-		if (utime(tmp_file.buf, &utb) < 0)
-			warning_errno(_("failed utime() on %s"), tmp_file.buf);
-	}
-
-	return finalize_object_file(tmp_file.buf, filename.buf);
+	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
+							 filename.buf, mtime);
 }
 
 static int freshen_loose_object(const struct object_id *oid)
diff --git a/wrapper.c b/wrapper.c
index 7c6586af321..cffe24d307a 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -540,6 +540,42 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	if (action == FSYNC_WRITEOUT_ONLY) {
+#ifdef __APPLE__
+		/*
+		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+	}
+
+#ifdef __APPLE__
+	return fcntl(fd, F_FULLFSYNC);
+#else
+	return fsync(fd);
+#endif
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index d33e68f6abb..8f53953d4ab 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-14  3:38     ` Neeraj Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 29 +++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..6074a3d3ced 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..c013920ce37
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,29 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index e6d482fbcc6..34c93314a50 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -451,6 +451,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..b573a5ee122 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index cffe24d307a..a9647018b68 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -565,6 +565,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-09-14  3:38     ` [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-14  3:38     ` Neeraj Singh via GitGitGadget
  2021-09-14 19:35       ` Junio C Hamano
  2021-09-14  3:38     ` [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
                       ` (3 subsequent siblings)
  7 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality. This mode
is enabled when passing paths to update-index via the --stdin flag,
as is done by 'git stash'.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to snoop the output of --verbose to
find out when update-index has actually processed a given path.
Additionally the index is locked for the duration of the update.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..b0689f2cdf6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1150,6 +1151,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		struct strbuf unquoted = STRBUF_INIT;
 
 		setup_work_tree();
+		plug_bulk_checkin();
 		while (getline_fn(&buf, stdin) != EOF) {
 			char *p;
 			if (!nul_term_line && buf.buf[0] == '"') {
@@ -1164,6 +1166,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				chmod_path(set_executable_bit, p);
 			free(p);
 		}
+		unplug_bulk_checkin(&lock_file);
 		strbuf_release(&unquoted);
 		strbuf_release(&buf);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-09-14  3:38     ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-14  3:38     ` Neeraj Singh via GitGitGadget
  2021-09-14  3:38     ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
                       ` (2 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/lib-unique-files.sh | 32 ++++++++++++++++++++++++++
 t/perf/p3700-add.sh        | 43 +++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh      | 46 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 121 insertions(+)
 create mode 100644 t/perf/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/lib-unique-files.sh b/t/perf/lib-unique-files.sh
new file mode 100644
index 00000000000..10083395ae5
--- /dev/null
+++ b/t/perf/lib-unique-files.sh
@@ -0,0 +1,32 @@
+# Helper to create files with unique contents
+
+test_create_unique_files_base__=$(date -u)
+test_create_unique_files_counter__=0
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
+#				    each in the current directory, all
+#				    with unique contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir" > /dev/null
+		for j in $(test_seq $files)
+		do
+			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
+			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..4ca3224f364
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/perf/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..407b95c104b
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/perf/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-09-14  3:38     ` [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-09-14  3:38     ` Neeraj Singh via GitGitGadget
  2021-09-15 16:21       ` Junio C Hamano
  2021-09-14  5:49     ` [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles Christoph Hellwig
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
  7 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-14  3:38 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 environment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/environment.c b/environment.c
index 3e23eafff80..27d5e11267e 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-09-14  3:38     ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
@ 2021-09-14  5:49     ` Christoph Hellwig
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
  7 siblings, 0 replies; 160+ messages in thread
From: Christoph Hellwig @ 2021-09-14  5:49 UTC (permalink / raw)
  To: Neeraj K. Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

On Tue, Sep 14, 2021 at 03:38:39AM +0000, Neeraj K. Singh via GitGitGadget wrote:
> NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
> on Linux, since sync_file_range does not provide data integrity guarantees.
> There is currently no kernel interface suitable to achieve disk flush
> batching as is, but he suggested that he might implement a 'syncfs' variant
> on top of this patchset. This code is still useful on macOS and Windows, and
> the config documentation makes that clear.

If this series lands I can give the syncfs variant a spin.  It might not
be the best option for gt hosting services, but I think it will be very
helpful for typical developer workstations.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-14 10:39       ` Bagas Sanjaya
  2021-09-14 19:05         ` Neeraj Singh
  2021-09-14 19:34       ` Junio C Hamano
  1 sibling, 1 reply; 160+ messages in thread
From: Bagas Sanjaya @ 2021-09-14 10:39 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget, git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh

On 14/09/21 10.38, Neeraj Singh via GitGitGadget wrote:
> _Performance numbers_:
> 
> Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
> Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
> Windows - Same host as Linux, a preview version of Windows 11.
> 	  This number is from a patch later in the series.
> 
> Adding 500 files to the repo with 'git add' Times reported in seconds.
> 
> core.fsyncObjectFiles | Linux | Mac   | Windows
> ----------------------|-------|-------|--------
>                  false | 0.06  |  0.35 | 0.61
>                  true  | 1.88  | 11.18 | 2.47
>                  batch | 0.15  |  0.41 | 1.53

Interesting here the performance.

You said that core.fsyncObjectFiles=batch performed 2.5x slower than 
core.fsyncObjectFile=false on Linux and Windows, why?

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-14 10:39       ` Bagas Sanjaya
@ 2021-09-14 19:05         ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-14 19:05 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

On Tue, Sep 14, 2021 at 3:39 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On 14/09/21 10.38, Neeraj Singh via GitGitGadget wrote:
> > _Performance numbers_:
> >
> > Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
> > Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
> > Windows - Same host as Linux, a preview version of Windows 11.
> >         This number is from a patch later in the series.
> >
> > Adding 500 files to the repo with 'git add' Times reported in seconds.
> >
> > core.fsyncObjectFiles | Linux | Mac   | Windows
> > ----------------------|-------|-------|--------
> >                  false | 0.06  |  0.35 | 0.61
> >                  true  | 1.88  | 11.18 | 2.47
> >                  batch | 0.15  |  0.41 | 1.53
>
> Interesting here the performance.
>
> You said that core.fsyncObjectFiles=batch performed 2.5x slower than
> core.fsyncObjectFile=false on Linux and Windows, why?
>

The goal of batch mode is to minimize the number of disk cache flush operations.
We still have to issue writes to the disk (and wait for them to
complete) in batch mode,
and on my test system those writes have to cross the VM boundary.  The
Mac is running
macOS natively, so performance of the writes is probably a little better.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
  2021-09-14 10:39       ` Bagas Sanjaya
@ 2021-09-14 19:34       ` Junio C Hamano
  2021-09-14 20:33         ` Junio C Hamano
  2021-09-15  4:55         ` Neeraj Singh
  1 sibling, 2 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-09-14 19:34 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/config.c b/config.c
> index cb4a8058bff..9fe3602e1c4 100644
> --- a/config.c
> +++ b/config.c
> @@ -1509,7 +1509,13 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
>  	}
>  
>  	if (!strcmp(var, "core.fsyncobjectfiles")) {
> -		fsync_object_files = git_config_bool(var, value);
> +		if (!value)
> +			return config_error_nonbool(var);
> +		if (!strcasecmp(value, "batch"))
> +			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> +		else
> +			fsync_object_files = git_config_bool(var, value)
> +				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
>  		return 0;

The original code used to allow the short-and-sweet valueless true

	[core]
		fsyncobjectfiles

but it no longer does by calling it a nonbool error.  This breaks
existing users' repositories that have been happily working, doesn't
it?

Perhaps

	if (value && !strcmp(value, "batch"))
		fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
	else if (git_config_bool(var, value))
		fsync_object_files = FSYNC_OBJECT_FILES_ON;
	else
		fsync_object_files = FSYNC_OBJECT_FILES_OFF;

> -/* Finalize a file on disk, and close it. */
> -static void close_loose_object(int fd)
> -{
> -	if (fsync_object_files)
> -		fsync_or_die(fd, "loose object file");
> -	if (close(fd) != 0)
> -		die_errno(_("error when closing loose object file"));
> -}
> -
>  /* Size of directory component, including the ending '/' */
>  static inline int directory_size(const char *filename)
>  {
> @@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  		die(_("confused by unstable object source data for %s"),
>  		    oid_to_hex(oid));
>  
> -	close_loose_object(fd);
> -
> -	if (mtime) {
> -		struct utimbuf utb;
> -		utb.actime = mtime;
> -		utb.modtime = mtime;
> -		if (utime(tmp_file.buf, &utb) < 0)
> -			warning_errno(_("failed utime() on %s"), tmp_file.buf);
> -	}
> -
> -	return finalize_object_file(tmp_file.buf, filename.buf);
> +	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
> +							 filename.buf, mtime);
>  }

This block of code looked familiar and I was about to complain "why
add it in one step and remove it in another?"

But it is a different instance from the one that was added in one of
the previous patches ;-).  

> +int git_fsync(int fd, enum fsync_action action)
> +{
> +	if (action == FSYNC_WRITEOUT_ONLY) {
> +#ifdef __APPLE__
> +		/*
> +		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
> +		 * flush hardware caches.
> +		 */
> +		return fsync(fd);
> +#endif
> +
> +#ifdef HAVE_SYNC_FILE_RANGE
> +		/*
> +		 * On linux 2.6.17 and above, sync_file_range is the way to issue
> +		 * a writeback without a hardware flush. An offset of 0 and size of 0
> +		 * indicates writeout of the entire file and the wait flags ensure that all
> +		 * dirty data is written to the disk (potentially in a disk-side cache)
> +		 * before we continue.
> +		 */
> +
> +		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
> +						 SYNC_FILE_RANGE_WRITE |
> +						 SYNC_FILE_RANGE_WAIT_AFTER);
> +#endif
> +
> +		errno = ENOSYS;
> +		return -1;
> +	}

This allows the caller that can take advantage of writeout-only mode
to naturally fall back on the full sync per each file if we cannot do
a writeout-only sync.  OK.

> +#ifdef __APPLE__
> +	return fcntl(fd, F_FULLFSYNC);
> +#else
> +	return fsync(fd);
> +#endif
> +}

If we are introducing "enum fsync_action", we should have some way
to make it clear that we are covering all the possible values of
"action".

Switching on action, i.e.

	switch (action) {
	case FSYNC_WRITEOUT_ONLY:
		...
		break;
	case FSYNC_HARDWARE_FLUSH:
		...
		break;
	default:
		BUG("unexpected git_fsync(%d) call", action);
	}

would be one way to do so.

Thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure
  2021-09-14  3:38     ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-14 19:35       ` Junio C Hamano
  0 siblings, 0 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-09-14 19:35 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> The update-index functionality is used internally by 'git stash push' to
> setup the internal stashed commit.

Nice.

> This change enables bulk-checkin for update-index infrastructure to
> speed up adding new objects to the object database by leveraging the
> pack functionality and the new bulk-fsync functionality. This mode
> is enabled when passing paths to update-index via the --stdin flag,
> as is done by 'git stash'.
>
> There is some risk with this change, since under batch fsync, the object
> files will not be available until the update-index is entirely complete.
> This usage is unlikely, since any tool invoking update-index and
> expecting to see objects would have to snoop the output of --verbose to
> find out when update-index has actually processed a given path.
> Additionally the index is locked for the duration of the update.
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  builtin/update-index.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index 187203e8bb5..b0689f2cdf6 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -5,6 +5,7 @@
>   */
>  #define USE_THE_INDEX_COMPATIBILITY_MACROS
>  #include "cache.h"
> +#include "bulk-checkin.h"
>  #include "config.h"
>  #include "lockfile.h"
>  #include "quote.h"
> @@ -1150,6 +1151,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>  		struct strbuf unquoted = STRBUF_INIT;
>  
>  		setup_work_tree();
> +		plug_bulk_checkin();
>  		while (getline_fn(&buf, stdin) != EOF) {
>  			char *p;
>  			if (!nul_term_line && buf.buf[0] == '"') {
> @@ -1164,6 +1166,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
>  				chmod_path(set_executable_bit, p);
>  			free(p);
>  		}
> +		unplug_bulk_checkin(&lock_file);
>  		strbuf_release(&unquoted);
>  		strbuf_release(&buf);
>  	}

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-14 19:34       ` Junio C Hamano
@ 2021-09-14 20:33         ` Junio C Hamano
  2021-09-15  4:55         ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-09-14 20:33 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

Junio C Hamano <gitster@pobox.com> writes:

> Perhaps
>
> 	if (value && !strcmp(value, "batch"))
> 		fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> 	else if (git_config_bool(var, value))
> 		fsync_object_files = FSYNC_OBJECT_FILES_ON;
> 	else
> 		fsync_object_files = FSYNC_OBJECT_FILES_OFF;

By the way, in case it wasn't clear, I do mean strcmp and not
strcasecmp.  Making these things that are meant to be machine
readable tokens to be spelled in different ways in the name of
"friendliness" is a disease.

Thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-14 19:34       ` Junio C Hamano
  2021-09-14 20:33         ` Junio C Hamano
@ 2021-09-15  4:55         ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-15  4:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

On Tue, Sep 14, 2021 at 12:34 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > diff --git a/config.c b/config.c
> > index cb4a8058bff..9fe3602e1c4 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -1509,7 +1509,13 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
> >       }
> >
> >       if (!strcmp(var, "core.fsyncobjectfiles")) {
> > -             fsync_object_files = git_config_bool(var, value);
> > +             if (!value)
> > +                     return config_error_nonbool(var);
> > +             if (!strcasecmp(value, "batch"))
> > +                     fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> > +             else
> > +                     fsync_object_files = git_config_bool(var, value)
> > +                             ? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
> >               return 0;
>
> The original code used to allow the short-and-sweet valueless true
>
>         [core]
>                 fsyncobjectfiles
>
> but it no longer does by calling it a nonbool error.  This breaks
> existing users' repositories that have been happily working, doesn't
> it?
>
> Perhaps
>
>         if (value && !strcmp(value, "batch"))
>                 fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
>         else if (git_config_bool(var, value))
>                 fsync_object_files = FSYNC_OBJECT_FILES_ON;
>         else
>                 fsync_object_files = FSYNC_OBJECT_FILES_OFF;

I'll take your suggestion, including the change to case-sensitive.

> > +#ifdef __APPLE__
> > +     return fcntl(fd, F_FULLFSYNC);
> > +#else
> > +     return fsync(fd);
> > +#endif
> > +}
>
> If we are introducing "enum fsync_action", we should have some way
> to make it clear that we are covering all the possible values of
> "action".
>
> Switching on action, i.e.
>
>         switch (action) {
>         case FSYNC_WRITEOUT_ONLY:
>                 ...
>                 break;
>         case FSYNC_HARDWARE_FLUSH:
>                 ...
>                 break;
>         default:
>                 BUG("unexpected git_fsync(%d) call", action);
>         }
>
> would be one way to do so.
>

Will do.

Thanks for reviewing my changes. I've updated the github PR.
I'll wait for a few more days to see if anyone has more feedback
before sending out another round of patches.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing
  2021-09-14  3:38     ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
@ 2021-09-15 16:21       ` Junio C Hamano
  2021-09-15 22:43         ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Junio C Hamano @ 2021-09-15 16:21 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  environment.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/environment.c b/environment.c
> index 3e23eafff80..27d5e11267e 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -43,7 +43,7 @@ const char *git_hooks_path;
>  int zlib_compression_level = Z_BEST_SPEED;
>  int core_compression_level;
>  int pack_compression_level = Z_DEFAULT_COMPRESSION;
> -enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
> +enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>  size_t delta_base_cache_limit = 96 * 1024 * 1024;

Despite what the title of the change claims, this is not "enable for
testing", but "enable for everybody even in production", isn't it?

I'd prefer we do not do this, certainly not for "testing".

If setting the variable to "batch" were meant to eventually improve
performance for all different flavours of workload, I do not think
we would mind if we set it to "batch" for those who opt into the
"experimental" set of features by setting the feature.experimental
configuration variable to true.  And after a few development cycles
when the feature proves to be useful for everybody, we may want to
apply this patch under a justification that is different from "for
testing".

On the other hand, if this is meant to help 85% of people while
degrading the remainder of workflow, I do not think we would want to
see this change without a warning that says something along the
lines of "under rare circumstances (e.g. if you employ such and such
workflow), the new default value used for the core.fsyncObjectFiles
configuration variable will hurt performance."

Since this is about answering the question "between performance and
crash resilience, where do you as an end user strike the balance for
your needs?", I do not think it falls into either of the above two
categories.  

The only plausible justification I can think of to apply a "we
default to 'batch' for everybody" patch with is something like:

    Now with the 'batch' setting for core.fsyncObjectFiles, unlike
    'true' that paid very high overhead, the overhead to ensure our
    writes hit the disk platters has so greatly been reduced that it
    hurts the performance only negligibly.  Let's switch the default
    from the unsafe value of 'false' to safer and performant value
    of 'batch'.

I however doubt with the current round of patches, we are there yet.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing
  2021-09-15 16:21       ` Junio C Hamano
@ 2021-09-15 22:43         ` Neeraj Singh
  2021-09-15 23:12           ` Junio C Hamano
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-15 22:43 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

On Wed, Sep 15, 2021 at 9:21 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> > ---
> >  environment.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/environment.c b/environment.c
> > index 3e23eafff80..27d5e11267e 100644
> > --- a/environment.c
> > +++ b/environment.c
> > @@ -43,7 +43,7 @@ const char *git_hooks_path;
> >  int zlib_compression_level = Z_BEST_SPEED;
> >  int core_compression_level;
> >  int pack_compression_level = Z_DEFAULT_COMPRESSION;
> > -enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
> > +enum FSYNC_OBJECT_FILES_MODE fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> >  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
> >  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
> >  size_t delta_base_cache_limit = 96 * 1024 * 1024;
>
> Despite what the title of the change claims, this is not "enable for
> testing", but "enable for everybody even in production", isn't it?
>
> I'd prefer we do not do this, certainly not for "testing".
>
> If setting the variable to "batch" were meant to eventually improve
> performance for all different flavours of workload, I do not think
> we would mind if we set it to "batch" for those who opt into the
> "experimental" set of features by setting the feature.experimental
> configuration variable to true.  And after a few development cycles
> when the feature proves to be useful for everybody, we may want to
> apply this patch under a justification that is different from "for
> testing".
>
> On the other hand, if this is meant to help 85% of people while
> degrading the remainder of workflow, I do not think we would want to
> see this change without a warning that says something along the
> lines of "under rare circumstances (e.g. if you employ such and such
> workflow), the new default value used for the core.fsyncObjectFiles
> configuration variable will hurt performance."
>
> Since this is about answering the question "between performance and
> crash resilience, where do you as an end user strike the balance for
> your needs?", I do not think it falls into either of the above two
> categories.
>
> The only plausible justification I can think of to apply a "we
> default to 'batch' for everybody" patch with is something like:
>
>     Now with the 'batch' setting for core.fsyncObjectFiles, unlike
>     'true' that paid very high overhead, the overhead to ensure our
>     writes hit the disk platters has so greatly been reduced that it
>     hurts the performance only negligibly.  Let's switch the default
>     from the unsafe value of 'false' to safer and performant value
>     of 'batch'.
>
> I however doubt with the current round of patches, we are there yet.

Sorry for being unclear here (and perhaps including an improper patch).
This commit is mainly to ensure that we get coverage of batch mode on all
platforms in the CI infrastructure.  I don't believe it should be included in
mainline git without significantly more discussion and experimentation.

However, I'd hope that Git for Windows would be able to adopt batch mode
by default when they pull this series in. They are currently enabling fsync
by default.

Batch mode does have more cost, particularly on rotational media.
I think git should eventually enable batch mode by default with the proviso that
maintainers and people running ephemeral CI infrastructure should turn
fsync off
if they care more about speed than durability.

Do you think that feature.experimental is a good place to put this right away,
or should we just leave this as an option that Git for Windows can pick up and
leave the other platforms alone?

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing
  2021-09-15 22:43         ` Neeraj Singh
@ 2021-09-15 23:12           ` Junio C Hamano
  2021-09-16  6:19             ` Junio C Hamano
  0 siblings, 1 reply; 160+ messages in thread
From: Junio C Hamano @ 2021-09-15 23:12 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

Neeraj Singh <nksingh85@gmail.com> writes:

> This commit is mainly to ensure that we get coverage of batch mode on all
> platforms in the CI infrastructure.  I don't believe it should be included in
> mainline git without significantly more discussion and experimentation.

Am I incorrect to say that only just a handful of code paths can
take advantage of the bulk checkin "plugging-unplugging" feature to
begin with, so running _all_ the existing tests that cover
everything with this core.fsyncobjectfiles=batch setting is rather
pointless?

If so, perhaps instead of 6/6, you should identify key code paths
that would be affected by this feature (perhaps "git add" is one of
them), and either write a new test script dedicated for this feature
or piggy-back on existing test scripts that already tests the code
paths and adding new test pieces there that exercise this new feature.

If it is a good idea to run all the tests with core.fsyncobjectfiles
set to batch, however, it probalby is easiest to invent a new
environment variable GIT_TEST_FORCE_CORE_FSYNCOBJECTFILES and have
it honored as the default when it is set, and add a NEW CI job that
exports the environment with the value "batch".  Other people
(including the ones from Microsoft, I think) are much more familiar
than I am on how to make this kind of thing work in GitHub Actions.

> Do you think that feature.experimental is a good place to put this right away,

I think feature.experimental should be used for something that we
hope would benefit "everybody", not "most of the users".  This is a
promise to our testers, who opt into "early preview" of upcoming
features should not be subjected to "this may or may not give better
experiences depending on your workflow".  They may already be
enjoying and even relying on other experimental features by opting
in, and we should strive not to add a reason for them to turn the
feature.experimental bit off by saying "this new experimental feature
that recently joined does not work for my use case."



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing
  2021-09-15 23:12           ` Junio C Hamano
@ 2021-09-16  6:19             ` Junio C Hamano
  0 siblings, 0 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-09-16  6:19 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

Junio C Hamano <gitster@pobox.com> writes:

> Neeraj Singh <nksingh85@gmail.com> writes:
>
>> This commit is mainly to ensure that we get coverage of batch mode on all
>> platforms in the CI infrastructure.  I don't believe it should be included in
>> mainline git without significantly more discussion and experimentation.
>
> Am I incorrect to say that only just a handful of code paths can
> take advantage of the bulk checkin "plugging-unplugging" feature to
> begin with, so running _all_ the existing tests that cover
> everything with this core.fsyncobjectfiles=batch setting is rather
> pointless?
>
> If so, perhaps instead of 6/6, you should identify key code paths
> that would be affected by this feature (perhaps "git add" is one of
> them), and either write a new test script dedicated for this feature
> or piggy-back on existing test scripts that already tests the code
> paths and adding new test pieces there that exercise this new feature.
>
> If it is a good idea to run all the tests with core.fsyncobjectfiles
> set to batch, however, it probalby is easiest to invent a new
> environment variable GIT_TEST_FORCE_CORE_FSYNCOBJECTFILES and have
> it honored as the default when it is set, and add a NEW CI job that
> exports the environment with the value "batch".  

I have to take a part of this back.  A new environment variable that
is honored in the absense of core.fsyncobjectfiles would be needed
if you need to run all tests, but you do not necessarily have to add
a new CI job---instead you should be able to piggyback on an
existing job, by mimicking the way how ci/run-build-and-tests.sh
enables various test options on one of the jobs.

> Other people
> (including the ones from Microsoft, I think) are much more familiar
> than I am on how to make this kind of thing work in GitHub Actions.

This part still stands ;-)  There might be a better way than adding
yet another environment variable.


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 0/6] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-09-14  5:49     ` [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles Christoph Hellwig
@ 2021-09-20 22:15     ` Neeraj K. Singh via GitGitGadget
  2021-09-20 22:15       ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                         ` (6 more replies)
  7 siblings, 7 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

Thanks to everyone for review so far! Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Changes since v2:

 * Removed an unused Makefile define (FSYNC_DOESNT_FLUSH) that slipped in
   from an intermediate change.

 * Drop the futimens part of the patch and return to just calling utime, now
   within the new bulk_checkin code. The utime to futimens change seemed to
   be problematic for some platforms (thanks Randall Becker), and is really
   orthogonal to the rest of the patch series.

 * (Optional commit) Enable batch mode by default so that we can shake loose
   any issues relating to deferring the renames until the
   unplug_bulk_checkin.

Changes since v1:

 * Switch from futimes(2) to futimens(2), which is in POSIX.1-2008. Contrary
   to dscho's suggestion, I'm still implementing the Windows version in the
   same patch and I'm not doing autoconf detection since this is a POSIX
   function.

 * Introduce a separate preparatory patch to the bulk-checkin infrastructure
   to separate the 'plugged' variable and rename the 'state' variable, as
   suggested by dscho.

 * Add performance numbers to the commit message of the main bulk fsync
   patch, as suggested by dscho.

 * Add a comment about the non-thread-safety of the bulk-checkin
   infrastructure, as suggested by avarab.

 * Rename the experimental mode to core.fsyncobjectfiles=batch, as suggested
   by dscho and avarab and others.

 * Add more details to Documentation/config/core.txt about the various
   settings and their intended effects, as suggested by avarab.

 * Switch to the string-list API to hold the rename state, as suggested by
   avarab.

 * Create a separate update-index patch to use bulk-checkin as suggested by
   dscho.

 * Add Windows support in the upstream git. This is done in a way that
   should not conflict with git-for-windows.

 * Add new performance tests that shows the delta based on fsync mode.

NOTE: Based on Christoph Hellwig's comments, the 'batch' mode is not correct
on Linux, since sync_file_range does not provide data integrity guarantees.
There is currently no kernel interface suitable to achieve disk flush
batching as is, but he suggested that he might implement a 'syncfs' variant
on top of this patchset. This code is still useful on macOS and Windows, and
the config documentation makes that clear.

Neeraj Singh (6):
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       |  26 +++++--
 Makefile                            |   6 ++
 builtin/add.c                       |   3 +-
 builtin/update-index.c              |   3 +
 bulk-checkin.c                      | 103 +++++++++++++++++++++++++---
 bulk-checkin.h                      |   5 +-
 cache.h                             |   8 ++-
 compat/mingw.h                      |   3 +
 compat/win32/flush.c                |  29 ++++++++
 config.c                            |   7 +-
 config.mak.uname                    |   3 +
 configure.ac                        |   8 +++
 contrib/buildsystems/CMakeLists.txt |   3 +-
 environment.c                       |   2 +-
 git-compat-util.h                   |   7 ++
 object-file.c                       |  22 +-----
 t/lib-unique-files.sh               |  34 +++++++++
 t/perf/p3700-add.sh                 |  43 ++++++++++++
 t/perf/p3900-stash.sh               |  46 +++++++++++++
 t/t3700-add.sh                      |  11 +++
 t/t3903-stash.sh                    |  14 ++++
 wrapper.c                           |  48 +++++++++++++
 write-or-die.c                      |   2 +-
 23 files changed, 392 insertions(+), 44 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v4
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v3:

 1:  d5893e28df1 = 1:  d5893e28df1 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 2:  f8b5b709e9e ! 2:  12cad737635 core.fsyncobjectfiles: batched disk flushes
     @@ config.c: static int git_default_core_config(const char *var, const char *value,
       
       	if (!strcmp(var, "core.fsyncobjectfiles")) {
      -		fsync_object_files = git_config_bool(var, value);
     -+		if (!value)
     -+			return config_error_nonbool(var);
     -+		if (!strcasecmp(value, "batch"))
     ++		if (value && !strcmp(value, "batch"))
      +			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
     ++		else if (git_config_bool(var, value))
     ++			fsync_object_files = FSYNC_OBJECT_FILES_ON;
      +		else
     -+			fsync_object_files = git_config_bool(var, value)
     -+				? FSYNC_OBJECT_FILES_ON : FSYNC_OBJECT_FILES_OFF;
     ++			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
       		return 0;
       	}
       
     @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
       
      +int git_fsync(int fd, enum fsync_action action)
      +{
     -+	if (action == FSYNC_WRITEOUT_ONLY) {
     ++	switch (action) {
     ++	case FSYNC_WRITEOUT_ONLY:
     ++
      +#ifdef __APPLE__
      +		/*
     -+		 * on Mac OS X, fsync just causes filesystem cache writeback but does not
     ++		 * on macOS, fsync just causes filesystem cache writeback but does not
      +		 * flush hardware caches.
      +		 */
      +		return fsync(fd);
     @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
      +
      +		errno = ENOSYS;
      +		return -1;
     -+	}
     ++
     ++	case FSYNC_HARDWARE_FLUSH:
      +
      +#ifdef __APPLE__
     -+	return fcntl(fd, F_FULLFSYNC);
     ++		return fcntl(fd, F_FULLFSYNC);
      +#else
     -+	return fsync(fd);
     ++		return fsync(fd);
      +#endif
     ++
     ++	default:
     ++		BUG("unexpected git_fsync(%d) call", action);
     ++	}
     ++
      +}
      +
       static int warn_if_unremovable(const char *op, const char *file, int rc)
 3:  815a862e229 ! 3:  a5b3e21b762 core.fsyncobjectfiles: add windows support for batch mode
     @@ wrapper.c: int git_fsync(int fd, enum fsync_action action)
      +
       		errno = ENOSYS;
       		return -1;
     - 	}
     + 
 4:  6b576038986 = 4:  f7f756f3932 update-index: use the bulk-checkin infrastructure
 -:  ----------- > 5:  afb0028e796 core.fsyncobjectfiles: tests for batch mode
 5:  b7ca3ba9302 ! 6:  3e6b80b5fa2 core.fsyncobjectfiles: performance tests for add and stash
     @@ Commit message
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     - ## t/perf/lib-unique-files.sh (new) ##
     -@@
     -+# Helper to create files with unique contents
     -+
     -+test_create_unique_files_base__=$(date -u)
     -+test_create_unique_files_counter__=0
     -+
     -+# Create multiple files with unique contents. Takes the number of
     -+# directories, the number of files in each directory, and the base
     -+# directory.
     -+#
     -+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
     -+#				    each in the current directory, all
     -+#				    with unique contents.
     -+
     -+test_create_unique_files() {
     -+	test "$#" -ne 3 && BUG "3 param"
     -+
     -+	local dirs=$1
     -+	local files=$2
     -+	local basedir=$3
     -+
     -+	for i in $(test_seq $dirs)
     -+	do
     -+		local dir=$basedir/dir$i
     -+
     -+		mkdir -p "$dir" > /dev/null
     -+		for j in $(test_seq $files)
     -+		do
     -+			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
     -+			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
     -+		done
     -+	done
     -+}
     -
       ## t/perf/p3700-add.sh (new) ##
      @@
      +#!/bin/sh
     @@ t/perf/p3700-add.sh (new)
      +
      +. ./perf-lib.sh
      +
     -+. $TEST_DIRECTORY/perf/lib-unique-files.sh
     ++. $TEST_DIRECTORY/lib-unique-files.sh
      +
      +test_perf_default_repo
      +test_checkout_worktree
     @@ t/perf/p3700-add.sh (new)
      +# We need to create the files each time we run the perf test, but
      +# we do not want to measure the cost of creating the files, so run
      +# the tet once.
     -+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
     ++if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
      +then
      +	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
      +	GIT_PERF_REPEAT_COUNT=1
     @@ t/perf/p3900-stash.sh (new)
      +
      +. ./perf-lib.sh
      +
     -+. $TEST_DIRECTORY/perf/lib-unique-files.sh
     ++. $TEST_DIRECTORY/lib-unique-files.sh
      +
      +test_perf_default_repo
      +test_checkout_worktree
     @@ t/perf/p3900-stash.sh (new)
      +# We need to create the files each time we run the perf test, but
      +# we do not want to measure the cost of creating the files, so run
      +# the tet once.
     -+if test "$GIT_PERF_REPEAT_COUNT" -ne 1
     ++if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
      +then
      +	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
      +	GIT_PERF_REPEAT_COUNT=1
 6:  55a40fc8fd5 < -:  ----------- core.fsyncobjectfiles: enable batch mode for testing

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
@ 2021-09-20 22:15       ` Neeraj Singh via GitGitGadget
  2021-09-20 22:15       ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index b023d9959aa..f117d62c908 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
@@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
  2021-09-20 22:15       ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-09-20 22:15       ` Neeraj Singh via GitGitGadget
  2021-09-21 23:16         ` Ævar Arnfjörð Bjarmason
  2021-09-20 22:15       ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                         ` (4 subsequent siblings)
  6 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
macOS, and Linux each offer mechanisms to write data from the filesystem
page cache without initiating a hardware flush.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
takes advantage of the bulk-checkin infrastructure to batch up hardware
flushes.

When the new mode is enabled we do the following for new objects:

1. Create a tmp_obj_XXXX file and write the object data to it.
2. Issue a pagecache writeback request and wait for it to complete.
3. Record the tmp name and the final name in the bulk-checkin state for
   later rename.

At the end of the entire transaction we:
1. Issue a fsync against the lock file to flush the hardware writeback
   cache, which should by now have processed the tmp file writes.
2. Rename all of the temp files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 26 ++++++++---
 Makefile                      |  6 +++
 builtin/add.c                 |  3 +-
 bulk-checkin.c                | 81 ++++++++++++++++++++++++++++++++++-
 bulk-checkin.h                |  5 ++-
 cache.h                       |  8 +++-
 config.c                      |  7 ++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 ++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 +++
 object-file.c                 | 22 +---------
 wrapper.c                     | 44 +++++++++++++++++++
 write-or-die.c                |  2 +-
 14 files changed, 189 insertions(+), 33 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..0006d90980d 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,26 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls the object store, so updates to any refs or the
+	index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write object data with a minimal set of FLUSH CACHE
+  (or equivalent) commands sent to the storage controller. If the operating
+  system interfaces are not available, this mode behaves the same as `true`.
+  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
+  filesystems and on Windows for repos stored on NTFS or ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 429c276058d..326c7607e0f 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/builtin/add.c b/builtin/add.c
index 2244311d485..dda4bf093a0 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -678,7 +678,8 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (chmod_arg && pathspec.nr)
 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
-	unplug_bulk_checkin();
+
+	unplug_bulk_checkin(&lock_file);
 
 finish:
 	if (write_locked_index(&the_index, &lock_file,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index f117d62c908..ddbab5e5c8c 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,15 +3,19 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
 
+static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
+
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
@@ -62,6 +66,32 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
+{
+	if (fsync_state->nr) {
+		struct string_list_item *rename;
+
+		/*
+		 * Issue a full hardware flush against the lock file to ensure
+		 * that all objects are durable before any renames occur.
+		 * The code in fsync_and_close_loose_object_bulk_checkin has
+		 * already ensured that writeout has occurred, but it has not
+		 * flushed any writeback cache in the storage hardware.
+		 */
+		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
+
+		for_each_string_list_item(rename, fsync_state) {
+			const char *src = rename->string;
+			const char *dst = rename->util;
+
+			if (finalize_object_file(src, dst))
+				die_errno(_("could not rename '%s' to '%s'"), src, dst);
+		}
+
+		string_list_clear(fsync_state, 1);
+	}
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+static void add_rename_bulk_checkin(struct string_list *fsync_state,
+				    const char *src, const char *dst)
+{
+	string_list_insert(fsync_state, src)->util = xstrdup(dst);
+}
+
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
+					      const char *filename, time_t mtime)
+{
+	int do_finalize = 1;
+	int ret = 0;
+
+	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
+		/*
+		 * If we have a plugged bulk checkin, we issue a call that
+		 * cleans the filesystem page cache but avoids a hardware flush
+		 * command. Later on we will issue a single hardware flush
+		 * before renaming files as part of do_sync_and_rename.
+		 */
+		if (bulk_checkin_plugged &&
+		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
+		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
+			do_finalize = 0;
+
+		} else {
+			fsync_or_die(fd, "loose object file");
+		}
+	}
+
+	if (close(fd))
+		die_errno(_("error when closing loose object file"));
+
+	if (mtime) {
+		struct utimbuf utb;
+		utb.actime = mtime;
+		utb.modtime = mtime;
+		if (utime(tmpfile, &utb) < 0)
+			warning_errno(_("failed utime() on %s"), tmpfile);
+	}
+
+	if (do_finalize)
+		ret = finalize_object_file(tmpfile, filename);
+
+	return ret;
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -273,10 +350,12 @@ void plug_bulk_checkin(void)
 	bulk_checkin_plugged = 1;
 }
 
-void unplug_bulk_checkin(void)
+void unplug_bulk_checkin(struct lock_file *lock_file)
 {
 	assert(bulk_checkin_plugged);
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_sync_and_rename(&bulk_fsync_state, lock_file);
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..4a3309c1531 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,11 +6,14 @@
 
 #include "cache.h"
 
+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
+					      const char *filename, time_t mtime);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
 
 void plug_bulk_checkin(void);
-void unplug_bulk_checkin(void);
+void unplug_bulk_checkin(struct lock_file *);
 
 #endif
diff --git a/cache.h b/cache.h
index d23de693680..39b3a88181a 100644
--- a/cache.h
+++ b/cache.h
@@ -985,7 +985,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index cb4a8058bff..1b403e00241 100644
--- a/config.c
+++ b/config.c
@@ -1509,7 +1509,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (value && !strcmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else if (git_config_bool(var, value))
+			fsync_object_files = FSYNC_OBJECT_FILES_ON;
+		else
+			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..e6d482fbcc6 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index d6b22ede7ea..3e23eafff80 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index b46605300ab..d14e2436276 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index a8be8994814..ea14c3a3483 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 	return 0;
 }
 
-/* Finalize a file on disk, and close it. */
-static void close_loose_object(int fd)
-{
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
-	if (close(fd) != 0)
-		die_errno(_("error when closing loose object file"));
-}
-
 /* Size of directory component, including the ending '/' */
 static inline int directory_size(const char *filename)
 {
@@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 		die(_("confused by unstable object source data for %s"),
 		    oid_to_hex(oid));
 
-	close_loose_object(fd);
-
-	if (mtime) {
-		struct utimbuf utb;
-		utb.actime = mtime;
-		utb.modtime = mtime;
-		if (utime(tmp_file.buf, &utb) < 0)
-			warning_errno(_("failed utime() on %s"), tmp_file.buf);
-	}
-
-	return finalize_object_file(tmp_file.buf, filename.buf);
+	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
+							 filename.buf, mtime);
 }
 
 static int freshen_loose_object(const struct object_id *oid)
diff --git a/wrapper.c b/wrapper.c
index 7c6586af321..bb4f9f043ce 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	switch (action) {
+	case FSYNC_WRITEOUT_ONLY:
+
+#ifdef __APPLE__
+		/*
+		 * on macOS, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+
+	case FSYNC_HARDWARE_FLUSH:
+
+#ifdef __APPLE__
+		return fcntl(fd, F_FULLFSYNC);
+#else
+		return fsync(fd);
+#endif
+
+	default:
+		BUG("unexpected git_fsync(%d) call", action);
+	}
+
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index d33e68f6abb..8f53953d4ab 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
  2021-09-20 22:15       ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
  2021-09-20 22:15       ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-20 22:15       ` Neeraj Singh via GitGitGadget
  2021-09-21 23:42         ` Ævar Arnfjörð Bjarmason
  2021-09-20 22:15       ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                         ` (3 subsequent siblings)
  6 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 29 +++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..6074a3d3ced 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..c013920ce37
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,29 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index e6d482fbcc6..34c93314a50 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -451,6 +451,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..b573a5ee122 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index bb4f9f043ce..1a1e2fba9c9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-09-20 22:15       ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-20 22:15       ` Neeraj Singh via GitGitGadget
  2021-09-21 23:46         ` Ævar Arnfjörð Bjarmason
  2021-09-20 22:15       ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
                         ` (2 subsequent siblings)
  6 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality. This mode
is enabled when passing paths to update-index via the --stdin flag,
as is done by 'git stash'.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to snoop the output of --verbose to
find out when update-index has actually processed a given path.
Additionally the index is locked for the duration of the update.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..b0689f2cdf6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1150,6 +1151,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		struct strbuf unquoted = STRBUF_INIT;
 
 		setup_work_tree();
+		plug_bulk_checkin();
 		while (getline_fn(&buf, stdin) != EOF) {
 			char *p;
 			if (!nul_term_line && buf.buf[0] == '"') {
@@ -1164,6 +1166,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 				chmod_path(set_executable_bit, p);
 			free(p);
 		}
+		unplug_bulk_checkin(&lock_file);
 		strbuf_release(&unquoted);
 		strbuf_release(&buf);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-09-20 22:15       ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-20 22:15       ` Neeraj Singh via GitGitGadget
  2021-09-21 23:54         ` Ævar Arnfjörð Bjarmason
  2021-09-20 22:15       ` [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  6 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add test cases to exercise batch mode for 'git add'
and 'git stash'. These tests ensure that the added
data winds up in the object database.

I verified the tests by introducing an incorrect rename
in do_sync_and_rename.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
 t/t3700-add.sh        | 11 +++++++++++
 t/t3903-stash.sh      | 14 ++++++++++++++
 3 files changed, 59 insertions(+)
 create mode 100644 t/lib-unique-files.sh

diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
new file mode 100644
index 00000000000..a8a25eba61d
--- /dev/null
+++ b/t/lib-unique-files.sh
@@ -0,0 +1,34 @@
+# Helper to create files with unique contents
+
+test_create_unique_files_base__=$(date -u)
+test_create_unique_files_counter__=0
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
+#				    each in the specified directory, all
+#				    with unique contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+
+	rm -rf $basedir >/dev/null
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir" > /dev/null
+		for j in $(test_seq $files)
+		do
+			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
+			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4086e1ebbc9..2122acc3e9e 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.'
 
 . ./test-lib.sh
 
+. $TEST_DIRECTORY/lib-unique-files.sh
+
 # Test the file mode "$1" of the file "$2" in the index.
 test_mode_in_index () {
 	case "$(git ls-files -s "$2")" in
@@ -33,6 +35,15 @@ test_expect_success \
     'Test that "git add -- -q" works' \
     'touch -- -q && git add -- -q'
 
+test_expect_success 'git add: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+	git ls-files --stage fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	cat fsynced_files | awk '{print \$2}' | xargs -n1 git cat-file -e
+"
+
 test_expect_success \
 	'git add: Test that executable bit is not used if core.filemode=0' \
 	'git config core.filemode 0 &&
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 873aa56e359..0b4e8bb55b8 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. $TEST_DIRECTORY/lib-unique-files.sh
 
 diff_cmp () {
 	for i in "$1" "$2"
@@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' '
 	git rev-parse --verify refs/stash:A.t
 '
 
+test_expect_success 'stash with core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+
+	# The files were untracked, so use the third parent,
+	# which contains the untracked files
+	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
+"
+
+
 test_expect_success 'stash -c stash.useBuiltin=false warning ' '
 	expected="stash.useBuiltin support has been removed" &&
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-09-20 22:15       ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-20 22:15       ` Neeraj Singh via GitGitGadget
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  6 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-20 22:15 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/p3700-add.sh   | 43 ++++++++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..e93c08a2e70
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..c9fcd0c03eb
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-20 22:15       ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-21 23:16         ` Ævar Arnfjörð Bjarmason
  2021-09-22  1:23           ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-21 23:16 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:

> When the new mode is enabled we do the following for new objects:
>
> 1. Create a tmp_obj_XXXX file and write the object data to it.
> 2. Issue a pagecache writeback request and wait for it to complete.
> 3. Record the tmp name and the final name in the bulk-checkin state for
>    later rename.
>
> At the end of the entire transaction we:
> 1. Issue a fsync against the lock file to flush the hardware writeback
>    cache, which should by now have processed the tmp file writes.
> 2. Rename all of the temp files to their final names.
> 3. When updating the index and/or refs, we assume that Git will issue
>    another fsync internal to that operation.

Perhaps note too that:

4. For loose objects, refs etc. we may or may not create directories,
   and most certainly will be updating metadata on the immediate
   directory containing the file, but none of that's fsync()'d.

> On a filesystem with a singular journal that is updated during name
> operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
> would expect the fsync to trigger a journal writeout so that this
> sequence is enough to ensure that the user's data is durable by the time
> the git command returns.
>
> This change also updates the macOS code to trigger a real hardware flush
> via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
> macOS there was no guarantee of durability since a simple fsync(2) call
> does not flush any hardware caches.

There's no discussion of whether this is or isn't known to also work
some Linux FS's, and for these OS's where this does work is this only
for the object files themselves, or does metadata also "ride along"?

> _Performance numbers_:
>
> Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
> Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
> Windows - Same host as Linux, a preview version of Windows 11.
> 	  This number is from a patch later in the series.
>
> Adding 500 files to the repo with 'git add' Times reported in seconds.
>
> core.fsyncObjectFiles | Linux | Mac   | Windows
> ----------------------|-------|-------|--------
>                 false | 0.06  |  0.35 | 0.61
>                 true  | 1.88  | 11.18 | 2.47
>                 batch | 0.15  |  0.41 | 1.53

Per my https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com
and 6/6 in this series we've got perf tests for add/stash, but it would
be really interesting to see how this is impacted by
transfer.unpackLimit in cases where we may be writing packs or loose
objects.

> [...]
>  core.fsyncObjectFiles::
> -	This boolean will enable 'fsync()' when writing object files.
> -+
> -This is a total waste of time and effort on a filesystem that orders
> -data writes properly, but can be useful for filesystems that do not use
> -journalling (traditional UNIX filesystems) or that only journal metadata
> -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> +	A value indicating the level of effort Git will expend in
> +	trying to make objects added to the repo durable in the event
> +	of an unclean system shutdown. This setting currently only
> +	controls the object store, so updates to any refs or the
> +	index may not be equally durable.

All these mentions of "object" should really clarify that it's "loose
objects", i.e. we always fsync pack files. 

> +* `false` allows data to remain in file system caches according to
> +  operating system policy, whence it may be lost if the system loses power
> +  or crashes.

As noted in point #4 of
https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ while
this direction is overall an improvement over the previously flippant
docs, they at least alluded to the context that the assumption behind
"false" is that you don't really care about loose objects, you care
about loose objects *and* the ref update or whatever.

As I think (this is from memory) we've covered already this may have
been all based on some old ext3 assumption, but it's probably worth
summarizing that here, i.e. if you've got an FS with global ordered
operations you can probably skip this, but probably not etc.

> +* `true` triggers a data integrity flush for each object added to the
> +  object store. This is the safest setting that is likely to ensure durability
> +  across all operating systems and file systems that honor the 'fsync' system
> +  call. However, this setting comes with a significant performance cost on
> +  common hardware.

This is really overpromising things by omitting the fact that eve if
we're getting this feature you've hacked up right, we're still not
fsyncing dir entries etc (also noted above).

So something that describes the narrow scope here, along with "loose
objects" etc....

> +* `batch` enables an experimental mode that uses interfaces available in some
> +  operating systems to write object data with a minimal set of FLUSH CACHE
> +  (or equivalent) commands sent to the storage controller. If the operating
> +  system interfaces are not available, this mode behaves the same as `true`.
> +  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
> +  filesystems and on Windows for repos stored on NTFS or ReFS.

Again, even if it's called "core.fsyncObjectFiles" if we're going to say
"safe" we really need to say safe in what sense. Having written and
fsync()'d the file is helping nobody if the metadata never arrives....

> +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
> +{
> +	if (fsync_state->nr) {

I think less indentation here would be nice:

    if (!fsync_state->nr)
        return;
    /* rest of unindented body */

Or better yet do this check in unplug_bulk_checkin(), then here:

    fsync_or_die();
    for_each_string_list_item() { ...}
    string_list_clear(....);


> +		struct string_list_item *rename;
> +
> +		/*
> +		 * Issue a full hardware flush against the lock file to ensure
> +		 * that all objects are durable before any renames occur.
> +		 * The code in fsync_and_close_loose_object_bulk_checkin has
> +		 * already ensured that writeout has occurred, but it has not
> +		 * flushed any writeback cache in the storage hardware.
> +		 */
> +		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
> +
> +		for_each_string_list_item(rename, fsync_state) {
> +			const char *src = rename->string;
> +			const char *dst = rename->util;
> +
> +			if (finalize_object_file(src, dst))
> +				die_errno(_("could not rename '%s' to '%s'"), src, dst);
> +		}
> +
> +		string_list_clear(fsync_state, 1);
> +	}
> +}
> +
>  static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
>  {
>  	int i;
> @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
>  	return 0;
>  }
>  
> +static void add_rename_bulk_checkin(struct string_list *fsync_state,
> +				    const char *src, const char *dst)
> +{
> +	string_list_insert(fsync_state, src)->util = xstrdup(dst);
> +}

Just has one caller, why not just inline the string_list_insert()
call...

> +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
> +					      const char *filename, time_t mtime)
> +{
> +	int do_finalize = 1;
> +	int ret = 0;
> +
> +	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {

Let's do postive enum comparisons, and with switch() statements, so the
compiler helps us to see if we've covered them all.

> +		/*
> +		 * If we have a plugged bulk checkin, we issue a call that
> +		 * cleans the filesystem page cache but avoids a hardware flush
> +		 * command. Later on we will issue a single hardware flush
> +		 * before renaming files as part of do_sync_and_rename.
> +		 */
> +		if (bulk_checkin_plugged &&
> +		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
> +		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
> +			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
> +			do_finalize = 0;
> +
> +		} else {
> +			fsync_or_die(fd, "loose object file");
> +		}
> +	}

So nothing ever explicitly checks FSYNC_OBJECT_FILES_ON...?

> -extern int fsync_object_files;
> +enum FSYNC_OBJECT_FILES_MODE {
> +    FSYNC_OBJECT_FILES_OFF,
> +    FSYNC_OBJECT_FILES_ON,
> +    FSYNC_OBJECT_FILES_BATCH
> +};

Style: We don't use ALL_CAPS for type names in this codebase, just the
enum labels themselves....

> +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;

...to the point where I had to rub my eyes to see what was going on here
... :)


> -		fsync_object_files = git_config_bool(var, value);
> +		if (value && !strcmp(value, "batch"))
> +			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> +		else if (git_config_bool(var, value))
> +			fsync_object_files = FSYNC_OBJECT_FILES_ON;
> +		else
> +			fsync_object_files = FSYNC_OBJECT_FILES_OFF;

Since the point of this setting is safety, let's explicitly check
true/false here, use git_config_maybe_bool(), and perhaps issue a
warning on unknown values, but maybe that would get too verbose...

If we have a future "supersafe" mode, it'll get mapped to "false" on
older versions of git, probably not a good idea...

>  		return 0;
>  	}
>  
> diff --git a/config.mak.uname b/config.mak.uname
> index 76516aaa9a5..e6d482fbcc6 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
>  	HAVE_CLOCK_MONOTONIC = YesPlease
>  	# -lrt is needed for clock_gettime on glibc <= 2.16
>  	NEEDS_LIBRT = YesPlease
> +	HAVE_SYNC_FILE_RANGE = YesPlease
>  	HAVE_GETDELIM = YesPlease
>  	SANE_TEXT_GREP=-a
>  	FREAD_READS_DIRECTORIES = UnfortunatelyYes
> diff --git a/configure.ac b/configure.ac
> index 031e8d3fee8..c711037d625 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
>  	[AC_MSG_RESULT([no])
>  	HAVE_CLOCK_MONOTONIC=])
>  GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
> +
> +#
> +# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
> +GIT_CHECK_FUNC(sync_file_range,
> +	[HAVE_SYNC_FILE_RANGE=YesPlease],
> +	[HAVE_SYNC_FILE_RANGE])
> +GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
> +
>  #
>  # Define NO_SETITIMER if you don't have setitimer.
>  GIT_CHECK_FUNC(setitimer,
> diff --git a/environment.c b/environment.c
> index d6b22ede7ea..3e23eafff80 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -43,7 +43,7 @@ const char *git_hooks_path;
>  int zlib_compression_level = Z_BEST_SPEED;
>  int core_compression_level;
>  int pack_compression_level = Z_DEFAULT_COMPRESSION;
> -int fsync_object_files;
> +enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>  size_t delta_base_cache_limit = 96 * 1024 * 1024;
> diff --git a/git-compat-util.h b/git-compat-util.h
> index b46605300ab..d14e2436276 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
>  void BUG(const char *fmt, ...);
>  #endif
>  
> +enum fsync_action {
> +    FSYNC_WRITEOUT_ONLY,
> +    FSYNC_HARDWARE_FLUSH
> +};
> +
> +int git_fsync(int fd, enum fsync_action action);
> +
>  /*
>   * Preserves errno, prints a message, but gives no warning for ENOENT.
>   * Returns 0 on success, which includes trying to unlink an object that does
> diff --git a/object-file.c b/object-file.c
> index a8be8994814..ea14c3a3483 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1859,15 +1859,6 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>  	return 0;
>  }
>  
> -/* Finalize a file on disk, and close it. */
> -static void close_loose_object(int fd)
> -{
> -	if (fsync_object_files)
> -		fsync_or_die(fd, "loose object file");
> -	if (close(fd) != 0)
> -		die_errno(_("error when closing loose object file"));
> -}
> -
>  /* Size of directory component, including the ending '/' */
>  static inline int directory_size(const char *filename)
>  {
> @@ -1973,17 +1964,8 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
>  		die(_("confused by unstable object source data for %s"),
>  		    oid_to_hex(oid));
>  
> -	close_loose_object(fd);
> -
> -	if (mtime) {
> -		struct utimbuf utb;
> -		utb.actime = mtime;
> -		utb.modtime = mtime;
> -		if (utime(tmp_file.buf, &utb) < 0)
> -			warning_errno(_("failed utime() on %s"), tmp_file.buf);
> -	}
> -
> -	return finalize_object_file(tmp_file.buf, filename.buf);
> +	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
> +							 filename.buf, mtime);
>  }
>  
>  static int freshen_loose_object(const struct object_id *oid)
> diff --git a/wrapper.c b/wrapper.c
> index 7c6586af321..bb4f9f043ce 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode)
>  	return fd;
>  }
>  
> +int git_fsync(int fd, enum fsync_action action)
> +{
> +	switch (action) {
> +	case FSYNC_WRITEOUT_ONLY:
> +
> +#ifdef __APPLE__
> +		/*
> +		 * on macOS, fsync just causes filesystem cache writeback but does not
> +		 * flush hardware caches.
> +		 */
> +		return fsync(fd);
> +#endif
> +
> +#ifdef HAVE_SYNC_FILE_RANGE
> +		/*
> +		 * On linux 2.6.17 and above, sync_file_range is the way to issue
> +		 * a writeback without a hardware flush. An offset of 0 and size of 0
> +		 * indicates writeout of the entire file and the wait flags ensure that all
> +		 * dirty data is written to the disk (potentially in a disk-side cache)
> +		 * before we continue.
> +		 */
> +
> +		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
> +						 SYNC_FILE_RANGE_WRITE |
> +						 SYNC_FILE_RANGE_WAIT_AFTER);
> +#endif
> +
> +		errno = ENOSYS;
> +		return -1;
> +
> +	case FSYNC_HARDWARE_FLUSH:
> +
> +#ifdef __APPLE__
> +		return fcntl(fd, F_FULLFSYNC);
> +#else
> +		return fsync(fd);
> +#endif
> +
> +	default:
> +		BUG("unexpected git_fsync(%d) call", action);
> +	}
> +
> +}
> +
>  static int warn_if_unremovable(const char *op, const char *file, int rc)
>  {
>  	int err;
> diff --git a/write-or-die.c b/write-or-die.c
> index d33e68f6abb..8f53953d4ab 100644
> --- a/write-or-die.c
> +++ b/write-or-die.c
> @@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
>  
>  void fsync_or_die(int fd, const char *msg)
>  {
> -	while (fsync(fd) < 0) {
> +	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
>  		if (errno != EINTR)
>  			die_errno("fsync error on '%s'", msg);
>  	}


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-20 22:15       ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-21 23:42         ` Ævar Arnfjörð Bjarmason
  2021-09-22  1:23           ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-21 23:42 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:

> +int win32_fsync_no_flush(int fd)
> +{
> +       IO_STATUS_BLOCK io_status;
> +
> +#define FLUSH_FLAGS_FILE_DATA_ONLY 1
> +
> +       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
> +			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
> +			 PIO_STATUS_BLOCK IoStatusBlock);
> +
> +       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
> +		errno = ENOSYS;
> +		return -1;
> +       }
> +
> +       /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */
> +       memset(&io_status, 0, sizeof(io_status));

See just an informative link to the API docs, or is the comemnt on the
memset() in particular. This comment seems like it's just doing a
Google/Bing search for you, so maybe better without it?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure
  2021-09-20 22:15       ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-21 23:46         ` Ævar Arnfjörð Bjarmason
  2021-09-22  1:27           ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-21 23:46 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> The update-index functionality is used internally by 'git stash push' to
> setup the internal stashed commit.
>
> This change enables bulk-checkin for update-index infrastructure to
> speed up adding new objects to the object database by leveraging the
> pack functionality and the new bulk-fsync functionality. This mode
> is enabled when passing paths to update-index via the --stdin flag,
> as is done by 'git stash'.
>
> There is some risk with this change, since under batch fsync, the object
> files will not be available until the update-index is entirely complete.
> This usage is unlikely, since any tool invoking update-index and
> expecting to see objects would have to snoop the output of --verbose to
> find out when update-index has actually processed a given path.
> Additionally the index is locked for the duration of the update.

Would you really need to sniff the verbose output? If I'm streaming data
to update-index now it looks like I could assume before that
update-index would have done the work if I managed to fflush() to it,
since it's processing a line at a time and doing the work in that
line-at-a-time loop.

I.e. you could print lines to it, and then do concurrent object lookups
knowing the data was written already...

I think this is probably fine, but that case seems way likelier than
someone sniffing back the verbose output, presumably for the "add" in
update_one(), but that's called in the getline_fn() loop...

All of this makes me wonder why this isn't using tmp-objdir.c, i.e. we
could have our cake and eat it too by writing the "real" objects, and
then just renaming them between directories instead. But perhaps the
answer has something to do with the metadata issues I raised.

And well, tmp-objdir.c isn't going to help someone in practice that's
relying on this "update-index --stdin" behavior, as they won't know
where we staged the temporary files...


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode
  2021-09-20 22:15       ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-21 23:54         ` Ævar Arnfjörð Bjarmason
  2021-09-22  1:30           ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-21 23:54 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:

> From: Neeraj Singh <neerajsi@microsoft.com>
>
> Add test cases to exercise batch mode for 'git add'
> and 'git stash'. These tests ensure that the added
> data winds up in the object database.
>
> I verified the tests by introducing an incorrect rename
> in do_sync_and_rename.
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
>  t/t3700-add.sh        | 11 +++++++++++
>  t/t3903-stash.sh      | 14 ++++++++++++++
>  3 files changed, 59 insertions(+)
>  create mode 100644 t/lib-unique-files.sh
>
> diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
> new file mode 100644
> index 00000000000..a8a25eba61d
> --- /dev/null
> +++ b/t/lib-unique-files.sh
> @@ -0,0 +1,34 @@
> +# Helper to create files with unique contents
> +
> +test_create_unique_files_base__=$(date -u)
> +test_create_unique_files_counter__=0
> +
> +# Create multiple files with unique contents. Takes the number of
> +# directories, the number of files in each directory, and the base
> +# directory.
> +#
> +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
> +#				    each in the specified directory, all
> +#				    with unique contents.
> +
> +test_create_unique_files() {
> +	test "$#" -ne 3 && BUG "3 param"
> +
> +	local dirs=$1
> +	local files=$2
> +	local basedir=$3
> +
> +	rm -rf $basedir >/dev/null

Why the >/dev/null? It's not a "-rfv", and any errors would go to
stderr.

> +		mkdir -p "$dir" > /dev/null

Ditto.

> +		for j in $(test_seq $files)
> +		do
> +			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
> +			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"

Would be much more readable if we these variables were shorter.

But actually, why are we trying to create files as a function of "date
-u" at all? This is all in the trash directory, which is rm -rf'd beween
runs, why aren't names created with test_seq or whatever OK? I.e. just
1.txt, 2.txt....

> +test_expect_success 'stash with core.fsyncobjectfiles=batch' "
> +	test_create_unique_files 2 4 fsync-files &&
> +	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
> +	rm -f fsynced_files &&
> +
> +	# The files were untracked, so use the third parent,
> +	# which contains the untracked files
> +	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
> +	test_line_count = 8 fsynced_files &&
> +	cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
> +"
> +
> +
>  test_expect_success 'stash -c stash.useBuiltin=false warning ' '
>  	expected="stash.useBuiltin support has been removed" &&

We really prefer our tests to create the same data each time if
possible, but as noted with the "date -u" comment above you're
explicitly bypassing that, but I still can't see why...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-21 23:16         ` Ævar Arnfjörð Bjarmason
@ 2021-09-22  1:23           ` Neeraj Singh
  2021-09-22  2:02             ` Ævar Arnfjörð Bjarmason
  2021-09-22 19:46             ` Neeraj Singh
  0 siblings, 2 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-22  1:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 4:41 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>
> > When the new mode is enabled we do the following for new objects:
> >
> > 1. Create a tmp_obj_XXXX file and write the object data to it.
> > 2. Issue a pagecache writeback request and wait for it to complete.
> > 3. Record the tmp name and the final name in the bulk-checkin state for
> >    later rename.
> >
> > At the end of the entire transaction we:
> > 1. Issue a fsync against the lock file to flush the hardware writeback
> >    cache, which should by now have processed the tmp file writes.
> > 2. Rename all of the temp files to their final names.
> > 3. When updating the index and/or refs, we assume that Git will issue
> >    another fsync internal to that operation.
>
> Perhaps note too that:
>
> 4. For loose objects, refs etc. we may or may not create directories,
>    and most certainly will be updating metadata on the immediate
>    directory containing the file, but none of that's fsync()'d.
>
> > On a filesystem with a singular journal that is updated during name
> > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
> > would expect the fsync to trigger a journal writeout so that this
> > sequence is enough to ensure that the user's data is durable by the time
> > the git command returns.
> >
> > This change also updates the macOS code to trigger a real hardware flush
> > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
> > macOS there was no guarantee of durability since a simple fsync(2) call
> > does not flush any hardware caches.
>
> There's no discussion of whether this is or isn't known to also work
> some Linux FS's, and for these OS's where this does work is this only
> for the object files themselves, or does metadata also "ride along"?
>

I unfortunately can't examine Linux kernel source code and the details
of metadata
consistency behavior across files is not something that anyone in that
group wants
to pin down. As far as I can tell, the only thing that's really
guaranteed is fsyncing
every single file you write down and its parent directory if you're
creating a new file
(which we always are).  As came up in conversation with Christoph
Hellwig elsewhere
on thread, Linux doesn't have any set of syscalls to make batch mode
safe.  It does look
like XFS would be safe if sync_file_ranges actually promised to wait
for all pagecache
writeback definitively, since it would do a "log force" to push all
the dirty metadata to
disk when we do our final fsync.

I really didn't want to say something definitive about what Linux can
or will do, since I'm
not in a position to really know or influence them.  Christoph did say
that he would be
interested in contributing a variant to this patch that would be
definitively safe on filesystems
that honor syncfs.

> > _Performance numbers_:
> >
> > Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
> > Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
> > Windows - Same host as Linux, a preview version of Windows 11.
> >         This number is from a patch later in the series.
> >
> > Adding 500 files to the repo with 'git add' Times reported in seconds.
> >
> > core.fsyncObjectFiles | Linux | Mac   | Windows
> > ----------------------|-------|-------|--------
> >                 false | 0.06  |  0.35 | 0.61
> >                 true  | 1.88  | 11.18 | 2.47
> >                 batch | 0.15  |  0.41 | 1.53
>
> Per my https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com
> and 6/6 in this series we've got perf tests for add/stash, but it would
> be really interesting to see how this is impacted by
> transfer.unpackLimit in cases where we may be writing packs or loose
> objects.

I'm having trouble understanding how unpackLimit is related to 'git stash'
or 'git add'. From code inspection, it doesn't look like we're using
those settings
for adding objects except from across a transport.

Are you proposing that we have a similar setting for adding objects
via 'add' using
a packfile?  I think that would be a good goal, but it might be a bit
tricky since we've
likely done a lot of the work to buffer the input objects in order to
compute their OIDs,
before we know how many objects there are to add. If the policy were
to "always add to
a packfile", it would be easier.

>
> > [...]
> >  core.fsyncObjectFiles::
> > -     This boolean will enable 'fsync()' when writing object files.
> > -+
> > -This is a total waste of time and effort on a filesystem that orders
> > -data writes properly, but can be useful for filesystems that do not use
> > -journalling (traditional UNIX filesystems) or that only journal metadata
> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
> > +     A value indicating the level of effort Git will expend in
> > +     trying to make objects added to the repo durable in the event
> > +     of an unclean system shutdown. This setting currently only
> > +     controls the object store, so updates to any refs or the
> > +     index may not be equally durable.
>
> All these mentions of "object" should really clarify that it's "loose
> objects", i.e. we always fsync pack files.
>
> > +* `false` allows data to remain in file system caches according to
> > +  operating system policy, whence it may be lost if the system loses power
> > +  or crashes.
>
> As noted in point #4 of
> https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ while
> this direction is overall an improvement over the previously flippant
> docs, they at least alluded to the context that the assumption behind
> "false" is that you don't really care about loose objects, you care
> about loose objects *and* the ref update or whatever.
>
> As I think (this is from memory) we've covered already this may have
> been all based on some old ext3 assumption, but it's probably worth
> summarizing that here, i.e. if you've got an FS with global ordered
> operations you can probably skip this, but probably not etc.
>
> > +* `true` triggers a data integrity flush for each object added to the
> > +  object store. This is the safest setting that is likely to ensure durability
> > +  across all operating systems and file systems that honor the 'fsync' system
> > +  call. However, this setting comes with a significant performance cost on
> > +  common hardware.
>
> This is really overpromising things by omitting the fact that eve if
> we're getting this feature you've hacked up right, we're still not
> fsyncing dir entries etc (also noted above).
>
> So something that describes the narrow scope here, along with "loose
> objects" etc....
>
> > +* `batch` enables an experimental mode that uses interfaces available in some
> > +  operating systems to write object data with a minimal set of FLUSH CACHE
> > +  (or equivalent) commands sent to the storage controller. If the operating
> > +  system interfaces are not available, this mode behaves the same as `true`.
> > +  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
> > +  filesystems and on Windows for repos stored on NTFS or ReFS.
>
> Again, even if it's called "core.fsyncObjectFiles" if we're going to say
> "safe" we really need to say safe in what sense. Having written and
> fsync()'d the file is helping nobody if the metadata never arrives....
>

My concern with your feedback here is that this is user-facing documentation.
I'd assume that people who are not intimately familiar with both their
filesystem
and Git's internals would just be completely mystified by a long commentary on
the specifics in the Config documentation. I think over time Git should focus on
making this setting really guarantee durability in a meaningful way
across the entire
repository.

> > +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
> > +{
> > +     if (fsync_state->nr) {
>
> I think less indentation here would be nice:
>
>     if (!fsync_state->nr)
>         return;
>     /* rest of unindented body */
>

Will fix.

> Or better yet do this check in unplug_bulk_checkin(), then here:
>
>     fsync_or_die();
>     for_each_string_list_item() { ...}
>     string_list_clear(....);
>
>

I'd prefer to put it in the callee for reasons of
separation-of-concerns.  I don't want
to have the caller and callee partially implement the contract. The
compiler should
do a good enough job, since it's only one caller and will probably get
totally inilined.

> > +             struct string_list_item *rename;
> > +
> > +             /*
> > +              * Issue a full hardware flush against the lock file to ensure
> > +              * that all objects are durable before any renames occur.
> > +              * The code in fsync_and_close_loose_object_bulk_checkin has
> > +              * already ensured that writeout has occurred, but it has not
> > +              * flushed any writeback cache in the storage hardware.
> > +              */
> > +             fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
> > +
> > +             for_each_string_list_item(rename, fsync_state) {
> > +                     const char *src = rename->string;
> > +                     const char *dst = rename->util;
> > +
> > +                     if (finalize_object_file(src, dst))
> > +                             die_errno(_("could not rename '%s' to '%s'"), src, dst);
> > +             }
> > +
> > +             string_list_clear(fsync_state, 1);
> > +     }
> > +}
> > +
> >  static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
> >  {
> >       int i;
> > @@ -256,6 +286,53 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
> >       return 0;
> >  }
> >
> > +static void add_rename_bulk_checkin(struct string_list *fsync_state,
> > +                                 const char *src, const char *dst)
> > +{
> > +     string_list_insert(fsync_state, src)->util = xstrdup(dst);
> > +}
>
> Just has one caller, why not just inline the string_list_insert()
> call...
>

I thought about doing that before.  I'll do it.

> > +int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
> > +                                           const char *filename, time_t mtime)
> > +{
> > +     int do_finalize = 1;
> > +     int ret = 0;
> > +
> > +     if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
>
> Let's do postive enum comparisons, and with switch() statements, so the
> compiler helps us to see if we've covered them all.
>

Ok, will switch to switch.

> > +             /*
> > +              * If we have a plugged bulk checkin, we issue a call that
> > +              * cleans the filesystem page cache but avoids a hardware flush
> > +              * command. Later on we will issue a single hardware flush
> > +              * before renaming files as part of do_sync_and_rename.
> > +              */
> > +             if (bulk_checkin_plugged &&
> > +                 fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
> > +                 git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
> > +                     add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
> > +                     do_finalize = 0;
> > +
> > +             } else {
> > +                     fsync_or_die(fd, "loose object file");
> > +             }
> > +     }
>
> So nothing ever explicitly checks FSYNC_OBJECT_FILES_ON...?
>

Yeah, I did it this way to avoid any code duplication, but I can change to
a switch if it doesn't require too much repetition.

> > -extern int fsync_object_files;
> > +enum FSYNC_OBJECT_FILES_MODE {
> > +    FSYNC_OBJECT_FILES_OFF,
> > +    FSYNC_OBJECT_FILES_ON,
> > +    FSYNC_OBJECT_FILES_BATCH
> > +};
>
> Style: We don't use ALL_CAPS for type names in this codebase, just the
> enum labels themselves....
>
> > +extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
>
> ...to the point where I had to rub my eyes to see what was going on here
> ... :)
>

Sorry, Windows Developer :). Will fix.


> > -             fsync_object_files = git_config_bool(var, value);
> > +             if (value && !strcmp(value, "batch"))
> > +                     fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> > +             else if (git_config_bool(var, value))
> > +                     fsync_object_files = FSYNC_OBJECT_FILES_ON;
> > +             else
> > +                     fsync_object_files = FSYNC_OBJECT_FILES_OFF;
>
> Since the point of this setting is safety, let's explicitly check
> true/false here, use git_config_maybe_bool(), and perhaps issue a
> warning on unknown values, but maybe that would get too verbose...
>
> If we have a future "supersafe" mode, it'll get mapped to "false" on
> older versions of git, probably not a good idea...
>

 I took Junio's suggestion verbatim.  I'll try a warning if the value
exists, and is not 'batch' or <maybe bool>.


Thanks for looking at my changes so thoroughly!
-Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-21 23:42         ` Ævar Arnfjörð Bjarmason
@ 2021-09-22  1:23           ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-22  1:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 4:44 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>
> > +int win32_fsync_no_flush(int fd)
> > +{
> > +       IO_STATUS_BLOCK io_status;
> > +
> > +#define FLUSH_FLAGS_FILE_DATA_ONLY 1
> > +
> > +       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
> > +                      HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
> > +                      PIO_STATUS_BLOCK IoStatusBlock);
> > +
> > +       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
> > +             errno = ENOSYS;
> > +             return -1;
> > +       }
> > +
> > +       /* See https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex */
> > +       memset(&io_status, 0, sizeof(io_status));
>
> See just an informative link to the API docs, or is the comemnt on the
> memset() in particular. This comment seems like it's just doing a
> Google/Bing search for you, so maybe better without it?

Will remove. Just wanted to make sure everyone knows taht this is
documented somewhere :).

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure
  2021-09-21 23:46         ` Ævar Arnfjörð Bjarmason
@ 2021-09-22  1:27           ` Neeraj Singh
  2021-09-23 22:32             ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-22  1:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 4:53 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > The update-index functionality is used internally by 'git stash push' to
> > setup the internal stashed commit.
> >
> > This change enables bulk-checkin for update-index infrastructure to
> > speed up adding new objects to the object database by leveraging the
> > pack functionality and the new bulk-fsync functionality. This mode
> > is enabled when passing paths to update-index via the --stdin flag,
> > as is done by 'git stash'.
> >
> > There is some risk with this change, since under batch fsync, the object
> > files will not be available until the update-index is entirely complete.
> > This usage is unlikely, since any tool invoking update-index and
> > expecting to see objects would have to snoop the output of --verbose to
> > find out when update-index has actually processed a given path.
> > Additionally the index is locked for the duration of the update.
>
> Would you really need to sniff the verbose output? If I'm streaming data
> to update-index now it looks like I could assume before that
> update-index would have done the work if I managed to fflush() to it,
> since it's processing a line at a time and doing the work in that
> line-at-a-time loop.
>
> I.e. you could print lines to it, and then do concurrent object lookups
> knowing the data was written already...
>
> I think this is probably fine, but that case seems way likelier than
> someone sniffing back the verbose output, presumably for the "add" in
> update_one(), but that's called in the getline_fn() loop...

Does fflush really guarantee that the reader has picked up the input from
a pipe across all environments?  Even if a reader picks up the input, does
that mean that the reader is done processing it?

Do you think I really need to revise this comment? Maybe leave a terser,
'this usage is thought to be unlikely'?

>
> All of this makes me wonder why this isn't using tmp-objdir.c, i.e. we
> could have our cake and eat it too by writing the "real" objects, and
> then just renaming them between directories instead. But perhaps the
> answer has something to do with the metadata issues I raised.
>
> And well, tmp-objdir.c isn't going to help someone in practice that's
> relying on this "update-index --stdin" behavior, as they won't know
> where we staged the temporary files...
>

One motivation of the current design behind renaming the files is that
some networked filesystems don't seem to like cross-directory renames
much.  It also so happens that ReFS on Windows also prefers renames to
stay within the directory. Actually any filesystem would likely be
slightly faster,
since fewer objects are being modified (one dir versus two).

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode
  2021-09-21 23:54         ` Ævar Arnfjörð Bjarmason
@ 2021-09-22  1:30           ` Neeraj Singh
  2021-09-22  1:58             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-22  1:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 4:58 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > Add test cases to exercise batch mode for 'git add'
> > and 'git stash'. These tests ensure that the added
> > data winds up in the object database.
> >
> > I verified the tests by introducing an incorrect rename
> > in do_sync_and_rename.
> >
> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> > ---
> >  t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
> >  t/t3700-add.sh        | 11 +++++++++++
> >  t/t3903-stash.sh      | 14 ++++++++++++++
> >  3 files changed, 59 insertions(+)
> >  create mode 100644 t/lib-unique-files.sh
> >
> > diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
> > new file mode 100644
> > index 00000000000..a8a25eba61d
> > --- /dev/null
> > +++ b/t/lib-unique-files.sh
> > @@ -0,0 +1,34 @@
> > +# Helper to create files with unique contents
> > +
> > +test_create_unique_files_base__=$(date -u)
> > +test_create_unique_files_counter__=0
> > +
> > +# Create multiple files with unique contents. Takes the number of
> > +# directories, the number of files in each directory, and the base
> > +# directory.
> > +#
> > +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
> > +#                                each in the specified directory, all
> > +#                                with unique contents.
> > +
> > +test_create_unique_files() {
> > +     test "$#" -ne 3 && BUG "3 param"
> > +
> > +     local dirs=$1
> > +     local files=$2
> > +     local basedir=$3
> > +
> > +     rm -rf $basedir >/dev/null
>
> Why the >/dev/null? It's not a "-rfv", and any errors would go to
> stderr.

Will fix. Clearly I don't know UNIX very well.

>
> > +             mkdir -p "$dir" > /dev/null
>
> Ditto.

Will fix.

>
> > +             for j in $(test_seq $files)
> > +             do
> > +                     test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
> > +                     echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
>
> Would be much more readable if we these variables were shorter.
>
> But actually, why are we trying to create files as a function of "date
> -u" at all? This is all in the trash directory, which is rm -rf'd beween
> runs, why aren't names created with test_seq or whatever OK? I.e. just
> 1.txt, 2.txt....
>

The uniqueness is in the contents of the file.  I wanted to make sure that
we are really creating new objects and not reusing old ones.  Is the scope
of the "trash repo" small enough that I can be guaranteed that a new one
is created before my test since the last time I tried adding something to
the ODB?

> > +test_expect_success 'stash with core.fsyncobjectfiles=batch' "
> > +     test_create_unique_files 2 4 fsync-files &&
> > +     git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
> > +     rm -f fsynced_files &&
> > +
> > +     # The files were untracked, so use the third parent,
> > +     # which contains the untracked files
> > +     git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
> > +     test_line_count = 8 fsynced_files &&
> > +     cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
> > +"
> > +
> > +
> >  test_expect_success 'stash -c stash.useBuiltin=false warning ' '
> >       expected="stash.useBuiltin support has been removed" &&
>
> We really prefer our tests to create the same data each time if
> possible, but as noted with the "date -u" comment above you're
> explicitly bypassing that, but I still can't see why...

I'm trying to make sure we get new object contents. Is there a better
way to achieve what I want without the risk of finding that the contents
are already in the database from a previous test run?

Thanks again for the thorough review,
-Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode
  2021-09-22  1:30           ` Neeraj Singh
@ 2021-09-22  1:58             ` Ævar Arnfjörð Bjarmason
  2021-09-22 17:55               ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-22  1:58 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Tue, Sep 21 2021, Neeraj Singh wrote:

> On Tue, Sep 21, 2021 at 4:58 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>>
>> > From: Neeraj Singh <neerajsi@microsoft.com>
>> >
>> > Add test cases to exercise batch mode for 'git add'
>> > and 'git stash'. These tests ensure that the added
>> > data winds up in the object database.
>> >
>> > I verified the tests by introducing an incorrect rename
>> > in do_sync_and_rename.
>> >
>> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
>> > ---
>> >  t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
>> >  t/t3700-add.sh        | 11 +++++++++++
>> >  t/t3903-stash.sh      | 14 ++++++++++++++
>> >  3 files changed, 59 insertions(+)
>> >  create mode 100644 t/lib-unique-files.sh
>> >
>> > diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
>> > new file mode 100644
>> > index 00000000000..a8a25eba61d
>> > --- /dev/null
>> > +++ b/t/lib-unique-files.sh
>> > @@ -0,0 +1,34 @@
>> > +# Helper to create files with unique contents
>> > +
>> > +test_create_unique_files_base__=$(date -u)
>> > +test_create_unique_files_counter__=0
>> > +
>> > +# Create multiple files with unique contents. Takes the number of
>> > +# directories, the number of files in each directory, and the base
>> > +# directory.
>> > +#
>> > +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
>> > +#                                each in the specified directory, all
>> > +#                                with unique contents.
>> > +
>> > +test_create_unique_files() {
>> > +     test "$#" -ne 3 && BUG "3 param"
>> > +
>> > +     local dirs=$1
>> > +     local files=$2
>> > +     local basedir=$3
>> > +
>> > +     rm -rf $basedir >/dev/null
>>
>> Why the >/dev/null? It's not a "-rfv", and any errors would go to
>> stderr.
>
> Will fix. Clearly I don't know UNIX very well.
>
>>
>> > +             mkdir -p "$dir" > /dev/null
>>
>> Ditto.
>
> Will fix.
>
>>
>> > +             for j in $(test_seq $files)
>> > +             do
>> > +                     test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
>> > +                     echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
>>
>> Would be much more readable if we these variables were shorter.
>>
>> But actually, why are we trying to create files as a function of "date
>> -u" at all? This is all in the trash directory, which is rm -rf'd beween
>> runs, why aren't names created with test_seq or whatever OK? I.e. just
>> 1.txt, 2.txt....
>>
>
> The uniqueness is in the contents of the file.  I wanted to make sure that
> we are really creating new objects and not reusing old ones.  Is the scope
> of the "trash repo" small enough that I can be guaranteed that a new one
> is created before my test since the last time I tried adding something to
> the ODB?
>
>> > +test_expect_success 'stash with core.fsyncobjectfiles=batch' "
>> > +     test_create_unique_files 2 4 fsync-files &&
>> > +     git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
>> > +     rm -f fsynced_files &&
>> > +
>> > +     # The files were untracked, so use the third parent,
>> > +     # which contains the untracked files
>> > +     git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
>> > +     test_line_count = 8 fsynced_files &&
>> > +     cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
>> > +"
>> > +
>> > +
>> >  test_expect_success 'stash -c stash.useBuiltin=false warning ' '
>> >       expected="stash.useBuiltin support has been removed" &&
>>
>> We really prefer our tests to create the same data each time if
>> possible, but as noted with the "date -u" comment above you're
>> explicitly bypassing that, but I still can't see why...
>
> I'm trying to make sure we get new object contents. Is there a better
> way to achieve what I want without the risk of finding that the contents
> are already in the database from a previous test run?

You can just do something like:

test_expect_success 'setup data' '
	test_commit A &&
	test_commit B
'

Which will create files A.t, B.t etc, or create them via:

    obj=$(echo foo | git hash-object -w --stdin)

etc.

I.e. the uniqueness you're doing here seems to assume that tests are
re-using the same object store across runs, but we create a new trash
directory for each one, if you run the test with "-d" you can see it
being left behind for inspection. This is already ensured for the test.

The only potential caveat I can imagine is that some filesystem like say
btrfs-like that does some COW or object de-duplication would behave
differently, but other than that...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-22  1:23           ` Neeraj Singh
@ 2021-09-22  2:02             ` Ævar Arnfjörð Bjarmason
  2021-09-22 19:46             ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-22  2:02 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Tue, Sep 21 2021, Neeraj Singh wrote:

> On Tue, Sep 21, 2021 at 4:41 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>>
>> > When the new mode is enabled we do the following for new objects:
>> >
>> > 1. Create a tmp_obj_XXXX file and write the object data to it.
>> > 2. Issue a pagecache writeback request and wait for it to complete.
>> > 3. Record the tmp name and the final name in the bulk-checkin state for
>> >    later rename.
>> >
>> > At the end of the entire transaction we:
>> > 1. Issue a fsync against the lock file to flush the hardware writeback
>> >    cache, which should by now have processed the tmp file writes.
>> > 2. Rename all of the temp files to their final names.
>> > 3. When updating the index and/or refs, we assume that Git will issue
>> >    another fsync internal to that operation.
>>
>> Perhaps note too that:
>>
>> 4. For loose objects, refs etc. we may or may not create directories,
>>    and most certainly will be updating metadata on the immediate
>>    directory containing the file, but none of that's fsync()'d.
>>
>> > On a filesystem with a singular journal that is updated during name
>> > operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
>> > would expect the fsync to trigger a journal writeout so that this
>> > sequence is enough to ensure that the user's data is durable by the time
>> > the git command returns.
>> >
>> > This change also updates the macOS code to trigger a real hardware flush
>> > via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
>> > macOS there was no guarantee of durability since a simple fsync(2) call
>> > does not flush any hardware caches.
>>
>> There's no discussion of whether this is or isn't known to also work
>> some Linux FS's, and for these OS's where this does work is this only
>> for the object files themselves, or does metadata also "ride along"?
>>
>
> I unfortunately can't examine Linux kernel source code and the details
> of metadata
> consistency behavior across files is not something that anyone in that
> group wants
> to pin down. As far as I can tell, the only thing that's really
> guaranteed is fsyncing
> every single file you write down and its parent directory if you're
> creating a new file
> (which we always are).  As came up in conversation with Christoph
> Hellwig elsewhere
> on thread, Linux doesn't have any set of syscalls to make batch mode
> safe.  It does look
> like XFS would be safe if sync_file_ranges actually promised to wait
> for all pagecache
> writeback definitively, since it would do a "log force" to push all
> the dirty metadata to
> disk when we do our final fsync.
>
> I really didn't want to say something definitive about what Linux can
> or will do, since I'm
> not in a position to really know or influence them.  Christoph did say
> that he would be
> interested in contributing a variant to this patch that would be
> definitively safe on filesystems
> that honor syncfs.

*nod*, it's fine if it's omitted. Just wondering if we knew but weren't
 saying etc.

>> > _Performance numbers_:
>> >
>> > Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
>> > Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
>> > Windows - Same host as Linux, a preview version of Windows 11.
>> >         This number is from a patch later in the series.
>> >
>> > Adding 500 files to the repo with 'git add' Times reported in seconds.
>> >
>> > core.fsyncObjectFiles | Linux | Mac   | Windows
>> > ----------------------|-------|-------|--------
>> >                 false | 0.06  |  0.35 | 0.61
>> >                 true  | 1.88  | 11.18 | 2.47
>> >                 batch | 0.15  |  0.41 | 1.53
>>
>> Per my https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com
>> and 6/6 in this series we've got perf tests for add/stash, but it would
>> be really interesting to see how this is impacted by
>> transfer.unpackLimit in cases where we may be writing packs or loose
>> objects.
>
> I'm having trouble understanding how unpackLimit is related to 'git stash'
> or 'git add'. From code inspection, it doesn't look like we're using
> those settings
> for adding objects except from across a transport.
>
> Are you proposing that we have a similar setting for adding objects
> via 'add' using
> a packfile?  I think that would be a good goal, but it might be a bit
> tricky since we've
> likely done a lot of the work to buffer the input objects in order to
> compute their OIDs,
> before we know how many objects there are to add. If the policy were
> to "always add to
> a packfile", it would be easier.

No, just that in the documentation that we should be explaining to the
reader that this mode that optimizes for loose object writing benefits
particular commands, but e.g. on the server-side that we'll probably
never write 500 objects, but stream them to one pack.

Which might also inform next steps for the commands this does help with,
i.e. can we make more things stream to packs? I think having this mode
is at worst a good transitory thing to have, but perhaps longer term
we'll want to simply write fewer individual loose objects.

In any case, pushing to a server with this configured and scaling that
by transfer.unpackLimit should nicely demonstrate the pack v.s. loose
object scenario at different fsck-settings.

>>
>> > [...]
>> >  core.fsyncObjectFiles::
>> > -     This boolean will enable 'fsync()' when writing object files.
>> > -+
>> > -This is a total waste of time and effort on a filesystem that orders
>> > -data writes properly, but can be useful for filesystems that do not use
>> > -journalling (traditional UNIX filesystems) or that only journal metadata
>> > -and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
>> > +     A value indicating the level of effort Git will expend in
>> > +     trying to make objects added to the repo durable in the event
>> > +     of an unclean system shutdown. This setting currently only
>> > +     controls the object store, so updates to any refs or the
>> > +     index may not be equally durable.
>>
>> All these mentions of "object" should really clarify that it's "loose
>> objects", i.e. we always fsync pack files.
>>
>> > +* `false` allows data to remain in file system caches according to
>> > +  operating system policy, whence it may be lost if the system loses power
>> > +  or crashes.
>>
>> As noted in point #4 of
>> https://lore.kernel.org/git/87mtp5cwpn.fsf@evledraar.gmail.com/ while
>> this direction is overall an improvement over the previously flippant
>> docs, they at least alluded to the context that the assumption behind
>> "false" is that you don't really care about loose objects, you care
>> about loose objects *and* the ref update or whatever.
>>
>> As I think (this is from memory) we've covered already this may have
>> been all based on some old ext3 assumption, but it's probably worth
>> summarizing that here, i.e. if you've got an FS with global ordered
>> operations you can probably skip this, but probably not etc.
>>
>> > +* `true` triggers a data integrity flush for each object added to the
>> > +  object store. This is the safest setting that is likely to ensure durability
>> > +  across all operating systems and file systems that honor the 'fsync' system
>> > +  call. However, this setting comes with a significant performance cost on
>> > +  common hardware.
>>
>> This is really overpromising things by omitting the fact that eve if
>> we're getting this feature you've hacked up right, we're still not
>> fsyncing dir entries etc (also noted above).
>>
>> So something that describes the narrow scope here, along with "loose
>> objects" etc....
>>
>> > +* `batch` enables an experimental mode that uses interfaces available in some
>> > +  operating systems to write object data with a minimal set of FLUSH CACHE
>> > +  (or equivalent) commands sent to the storage controller. If the operating
>> > +  system interfaces are not available, this mode behaves the same as `true`.
>> > +  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
>> > +  filesystems and on Windows for repos stored on NTFS or ReFS.
>>
>> Again, even if it's called "core.fsyncObjectFiles" if we're going to say
>> "safe" we really need to say safe in what sense. Having written and
>> fsync()'d the file is helping nobody if the metadata never arrives....
>>
>
> My concern with your feedback here is that this is user-facing documentation.
> I'd assume that people who are not intimately familiar with both their
> filesystem
> and Git's internals would just be completely mystified by a long commentary on
> the specifics in the Config documentation. I think over time Git should focus on
> making this setting really guarantee durability in a meaningful way
> across the entire
> repository.

Yeah, this setting though is probably going to be tweaked only by fairly
expert-level users of git.

I think it's fine if it just explicitly punts and says something like
'this is what it does, this may or may not work on your FS' etc., my
main issue with the current docs is that they give off this vibe of
knowing a lot more than they're telling you.

>> > +static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
>> > +{
>> > +     if (fsync_state->nr) {
>>
>> I think less indentation here would be nice:
>>
>>     if (!fsync_state->nr)
>>         return;
>>     /* rest of unindented body */
>>
>
> Will fix.
>
>> Or better yet do this check in unplug_bulk_checkin(), then here:
>>
>>     fsync_or_die();
>>     for_each_string_list_item() { ...}
>>     string_list_clear(....);
>>
>>
>
> I'd prefer to put it in the callee for reasons of
> separation-of-concerns.  I don't want
> to have the caller and callee partially implement the contract. The
> compiler should
> do a good enough job, since it's only one caller and will probably get
> totally inilined.

*nod*

For what it's worth I meant the "inlined" just in terms of avoiding the
indirection for human readers, it won't matter to the machine,
especially since this is all I/O bound...

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode
  2021-09-22  1:58             ` Ævar Arnfjörð Bjarmason
@ 2021-09-22 17:55               ` Neeraj Singh
  2021-09-22 20:01                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-22 17:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 7:02 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Sep 21 2021, Neeraj Singh wrote:
>
> > On Tue, Sep 21, 2021 at 4:58 PM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >>
> >> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
> >>
> >> > From: Neeraj Singh <neerajsi@microsoft.com>
> >> >
> >> > Add test cases to exercise batch mode for 'git add'
> >> > and 'git stash'. These tests ensure that the added
> >> > data winds up in the object database.
> >> >
> >> > I verified the tests by introducing an incorrect rename
> >> > in do_sync_and_rename.
> >> >
> >> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> >> > ---
> >> >  t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
> >> >  t/t3700-add.sh        | 11 +++++++++++
> >> >  t/t3903-stash.sh      | 14 ++++++++++++++
> >> >  3 files changed, 59 insertions(+)
> >> >  create mode 100644 t/lib-unique-files.sh
> >> >
> >> > diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
> >> > new file mode 100644
> >> > index 00000000000..a8a25eba61d
> >> > --- /dev/null
> >> > +++ b/t/lib-unique-files.sh
> >> > @@ -0,0 +1,34 @@
> >> > +# Helper to create files with unique contents
> >> > +
> >> > +test_create_unique_files_base__=$(date -u)
> >> > +test_create_unique_files_counter__=0
> >> > +
> >> > +# Create multiple files with unique contents. Takes the number of
> >> > +# directories, the number of files in each directory, and the base
> >> > +# directory.
> >> > +#
> >> > +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
> >> > +#                                each in the specified directory, all
> >> > +#                                with unique contents.
> >> > +
> >> > +test_create_unique_files() {
> >> > +     test "$#" -ne 3 && BUG "3 param"
> >> > +
> >> > +     local dirs=$1
> >> > +     local files=$2
> >> > +     local basedir=$3
> >> > +
> >> > +     rm -rf $basedir >/dev/null
> >>
> >> Why the >/dev/null? It's not a "-rfv", and any errors would go to
> >> stderr.
> >
> > Will fix. Clearly I don't know UNIX very well.
> >
> >>
> >> > +             mkdir -p "$dir" > /dev/null
> >>
> >> Ditto.
> >
> > Will fix.
> >
> >>
> >> > +             for j in $(test_seq $files)
> >> > +             do
> >> > +                     test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
> >> > +                     echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
> >>
> >> Would be much more readable if we these variables were shorter.
> >>
> >> But actually, why are we trying to create files as a function of "date
> >> -u" at all? This is all in the trash directory, which is rm -rf'd beween
> >> runs, why aren't names created with test_seq or whatever OK? I.e. just
> >> 1.txt, 2.txt....
> >>
> >
> > The uniqueness is in the contents of the file.  I wanted to make sure that
> > we are really creating new objects and not reusing old ones.  Is the scope
> > of the "trash repo" small enough that I can be guaranteed that a new one
> > is created before my test since the last time I tried adding something to
> > the ODB?
> >
> >> > +test_expect_success 'stash with core.fsyncobjectfiles=batch' "
> >> > +     test_create_unique_files 2 4 fsync-files &&
> >> > +     git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
> >> > +     rm -f fsynced_files &&
> >> > +
> >> > +     # The files were untracked, so use the third parent,
> >> > +     # which contains the untracked files
> >> > +     git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
> >> > +     test_line_count = 8 fsynced_files &&
> >> > +     cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
> >> > +"
> >> > +
> >> > +
> >> >  test_expect_success 'stash -c stash.useBuiltin=false warning ' '
> >> >       expected="stash.useBuiltin support has been removed" &&
> >>
> >> We really prefer our tests to create the same data each time if
> >> possible, but as noted with the "date -u" comment above you're
> >> explicitly bypassing that, but I still can't see why...
> >
> > I'm trying to make sure we get new object contents. Is there a better
> > way to achieve what I want without the risk of finding that the contents
> > are already in the database from a previous test run?
>
> You can just do something like:
>
> test_expect_success 'setup data' '
>         test_commit A &&
>         test_commit B
> '
>
> Which will create files A.t, B.t etc, or create them via:
>
>     obj=$(echo foo | git hash-object -w --stdin)
>
> etc.
>
> I.e. the uniqueness you're doing here seems to assume that tests are
> re-using the same object store across runs, but we create a new trash
> directory for each one, if you run the test with "-d" you can see it
> being left behind for inspection. This is already ensured for the test.
>
> The only potential caveat I can imagine is that some filesystem like say
> btrfs-like that does some COW or object de-duplication would behave
> differently, but other than that...

It looks like the same repo is reused for each test_expect_success
line in the top-level t*.sh script.
So for test_create_unique_files to be maximally useful, it should have
some state that is different for
each invocation.  How about I use the test_tick mechanism to produce
this uniqueness?  It wouldn't
be globally unique like the date method, but it should be good enough
if the repo is recycled every time
test-lib is reinitialized.

I'm changing lib-unique-files to use test_tick and to be a little more
readable as you suggested. Please
let me know if you have any other suggestions.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes
  2021-09-22  1:23           ` Neeraj Singh
  2021-09-22  2:02             ` Ævar Arnfjörð Bjarmason
@ 2021-09-22 19:46             ` Neeraj Singh
  1 sibling, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-22 19:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 6:23 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 4:41 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> >
> > On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:

> > > -             fsync_object_files = git_config_bool(var, value);
> > > +             if (value && !strcmp(value, "batch"))
> > > +                     fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
> > > +             else if (git_config_bool(var, value))
> > > +                     fsync_object_files = FSYNC_OBJECT_FILES_ON;
> > > +             else
> > > +                     fsync_object_files = FSYNC_OBJECT_FILES_OFF;
> >
> > Since the point of this setting is safety, let's explicitly check
> > true/false here, use git_config_maybe_bool(), and perhaps issue a
> > warning on unknown values, but maybe that would get too verbose...
> >
> > If we have a future "supersafe" mode, it'll get mapped to "false" on
> > older versions of git, probably not a good idea...
> >
>
>  I took Junio's suggestion verbatim.  I'll try a warning if the value
> exists, and is not 'batch' or <maybe bool>.

An update on this.  I tested out some values:
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=batch add ./
    fsync_object_files: 2
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=0 add ./
    fsync_object_files: 0
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=1 add ./
    fsync_object_files: 1
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=2 add ./
    fsync_object_files: 1
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=barf add ./
    fatal: bad boolean config value 'barf' for 'core.fsyncobjectfiles'
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=true add ./
    fsync_object_files: 1
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=false add ./
    fsync_object_files: 0
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=t add ./
    fatal: bad boolean config value 't' for 'core.fsyncobjectfiles'
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=y add ./
    fatal: bad boolean config value 'y' for 'core.fsyncobjectfiles'
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=yes add ./
    fsync_object_files: 1
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=no add ./
    fsync_object_files: 0
    nksingh@neerajsi-x1:~/src/git$ ./git -c core.fsyncobjectfiles=nope add ./
    fatal: bad boolean config value 'nope' for 'core.fsyncobjectfiles'

So I think the code already works like you are suggesting (thanks Junio!).

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode
  2021-09-22 17:55               ` Neeraj Singh
@ 2021-09-22 20:01                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-22 20:01 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh


On Wed, Sep 22 2021, Neeraj Singh wrote:

> On Tue, Sep 21, 2021 at 7:02 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Tue, Sep 21 2021, Neeraj Singh wrote:
>>
>> > On Tue, Sep 21, 2021 at 4:58 PM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>
>> >>
>> >> On Mon, Sep 20 2021, Neeraj Singh via GitGitGadget wrote:
>> >>
>> >> > From: Neeraj Singh <neerajsi@microsoft.com>
>> >> >
>> >> > Add test cases to exercise batch mode for 'git add'
>> >> > and 'git stash'. These tests ensure that the added
>> >> > data winds up in the object database.
>> >> >
>> >> > I verified the tests by introducing an incorrect rename
>> >> > in do_sync_and_rename.
>> >> >
>> >> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
>> >> > ---
>> >> >  t/lib-unique-files.sh | 34 ++++++++++++++++++++++++++++++++++
>> >> >  t/t3700-add.sh        | 11 +++++++++++
>> >> >  t/t3903-stash.sh      | 14 ++++++++++++++
>> >> >  3 files changed, 59 insertions(+)
>> >> >  create mode 100644 t/lib-unique-files.sh
>> >> >
>> >> > diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
>> >> > new file mode 100644
>> >> > index 00000000000..a8a25eba61d
>> >> > --- /dev/null
>> >> > +++ b/t/lib-unique-files.sh
>> >> > @@ -0,0 +1,34 @@
>> >> > +# Helper to create files with unique contents
>> >> > +
>> >> > +test_create_unique_files_base__=$(date -u)
>> >> > +test_create_unique_files_counter__=0
>> >> > +
>> >> > +# Create multiple files with unique contents. Takes the number of
>> >> > +# directories, the number of files in each directory, and the base
>> >> > +# directory.
>> >> > +#
>> >> > +# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
>> >> > +#                                each in the specified directory, all
>> >> > +#                                with unique contents.
>> >> > +
>> >> > +test_create_unique_files() {
>> >> > +     test "$#" -ne 3 && BUG "3 param"
>> >> > +
>> >> > +     local dirs=$1
>> >> > +     local files=$2
>> >> > +     local basedir=$3
>> >> > +
>> >> > +     rm -rf $basedir >/dev/null
>> >>
>> >> Why the >/dev/null? It's not a "-rfv", and any errors would go to
>> >> stderr.
>> >
>> > Will fix. Clearly I don't know UNIX very well.
>> >
>> >>
>> >> > +             mkdir -p "$dir" > /dev/null
>> >>
>> >> Ditto.
>> >
>> > Will fix.
>> >
>> >>
>> >> > +             for j in $(test_seq $files)
>> >> > +             do
>> >> > +                     test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
>> >> > +                     echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
>> >>
>> >> Would be much more readable if we these variables were shorter.
>> >>
>> >> But actually, why are we trying to create files as a function of "date
>> >> -u" at all? This is all in the trash directory, which is rm -rf'd beween
>> >> runs, why aren't names created with test_seq or whatever OK? I.e. just
>> >> 1.txt, 2.txt....
>> >>
>> >
>> > The uniqueness is in the contents of the file.  I wanted to make sure that
>> > we are really creating new objects and not reusing old ones.  Is the scope
>> > of the "trash repo" small enough that I can be guaranteed that a new one
>> > is created before my test since the last time I tried adding something to
>> > the ODB?
>> >
>> >> > +test_expect_success 'stash with core.fsyncobjectfiles=batch' "
>> >> > +     test_create_unique_files 2 4 fsync-files &&
>> >> > +     git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
>> >> > +     rm -f fsynced_files &&
>> >> > +
>> >> > +     # The files were untracked, so use the third parent,
>> >> > +     # which contains the untracked files
>> >> > +     git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
>> >> > +     test_line_count = 8 fsynced_files &&
>> >> > +     cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
>> >> > +"
>> >> > +
>> >> > +
>> >> >  test_expect_success 'stash -c stash.useBuiltin=false warning ' '
>> >> >       expected="stash.useBuiltin support has been removed" &&
>> >>
>> >> We really prefer our tests to create the same data each time if
>> >> possible, but as noted with the "date -u" comment above you're
>> >> explicitly bypassing that, but I still can't see why...
>> >
>> > I'm trying to make sure we get new object contents. Is there a better
>> > way to achieve what I want without the risk of finding that the contents
>> > are already in the database from a previous test run?
>>
>> You can just do something like:
>>
>> test_expect_success 'setup data' '
>>         test_commit A &&
>>         test_commit B
>> '
>>
>> Which will create files A.t, B.t etc, or create them via:
>>
>>     obj=$(echo foo | git hash-object -w --stdin)
>>
>> etc.
>>
>> I.e. the uniqueness you're doing here seems to assume that tests are
>> re-using the same object store across runs, but we create a new trash
>> directory for each one, if you run the test with "-d" you can see it
>> being left behind for inspection. This is already ensured for the test.
>>
>> The only potential caveat I can imagine is that some filesystem like say
>> btrfs-like that does some COW or object de-duplication would behave
>> differently, but other than that...
>
> It looks like the same repo is reused for each test_expect_success
> line in the top-level t*.sh script.
> So for test_create_unique_files to be maximally useful, it should have
> some state that is different for
> each invocation.  How about I use the test_tick mechanism to produce
> this uniqueness?  It wouldn't
> be globally unique like the date method, but it should be good enough
> if the repo is recycled every time
> test-lib is reinitialized.
>
> I'm changing lib-unique-files to use test_tick and to be a little more
> readable as you suggested. Please
> let me know if you have any other suggestions.

Ah, sorry, I thought you meant you wanted uniqueness within the test
file, but no, by default we'll create *one* repo for you, and each
test_expect_success reuses that.

Generally tests that want that do one of (in each test_expect_success):

# I'm making my own repo
git init new-repo 1 &&
(
	cd new-repo-1 &&
	[...]
)

# Or, in the first one
<setup the repo data>
# Then, in a second one
git clone . new-repo-1

I.e. just using "git clone" to ferry the data around, or cp -R if you'd
like to retain the exact file layout etc.




^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure
  2021-09-22  1:27           ` Neeraj Singh
@ 2021-09-23 22:32             ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-23 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Neeraj Singh

On Tue, Sep 21, 2021 at 6:27 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 4:53 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> > All of this makes me wonder why this isn't using tmp-objdir.c, i.e. we
> > could have our cake and eat it too by writing the "real" objects, and
> > then just renaming them between directories instead. But perhaps the
> > answer has something to do with the metadata issues I raised.
> >
> > And well, tmp-objdir.c isn't going to help someone in practice that's
> > relying on this "update-index --stdin" behavior, as they won't know
> > where we staged the temporary files...
> >
>
> One motivation of the current design behind renaming the files is that
> some networked filesystems don't seem to like cross-directory renames
> much.  It also so happens that ReFS on Windows also prefers renames to
> stay within the directory. Actually any filesystem would likely be
> slightly faster,
> since fewer objects are being modified (one dir versus two).

Whelp, as part of v5 I tried to make unpack-objects.c use the batch fsync
mode and now I see a strong reason to take your tmp-objdir suggestion. As
part of OBJ_REF_DELTA unpacking, we need access to the object while
we're in the plugged state. I didn't notice this at first, but got
lucky that I tested
that case first and hit an error.

V5 will create a tmp-objdir and add a new interface to install it as the primary
objdir.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-09-20 22:15       ` [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12       ` Neeraj K. Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
                           ` (8 more replies)
  6 siblings, 9 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

Thanks to everyone for review so far! Changes since v4, all in response to
review feedback from Ævar Arnfjörð Bjarmason:

 * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
   to add a statement about not fsyncing parent directories.
   
   * I still don't want to make any promises on behalf of the Linux FS developers
     in the documentation. However, according to [v4.1] and my understanding
     of how XFS journals are documented to work, it looks like recent versions
     of Linux running on XFS should be as safe as Windows or macOS in 'batch'
     mode. I don't know about ext4, since it's not clear to me when metadata
     updates are made visible to the journal.
   

 * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
   pointed out, this lets us access the added loose objects immediately,
   rather than only after unplugging the bulk checkin. This is a hard
   requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
   
   * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
     tmp objdir (as determined by the quarantine environment variable).
   
   * I added support to the tmp-objdir lib to replace the 'main' writable odb.
   
   * Instead of using a lockfile for the final full fsync, we now use a new dummy
     temp file. Doing that makes the below unpack-objects change easier.
   

 * Add bulk-checkin support to unpack-objects, which is used in fetch and
   push. In addition to making those operations faster, it allows us to
   directly compare performance of packfiles against loose objects. Please
   see [v4.2] for a measurement of 'git push' to a local upstream with
   different numbers of unique new files.

 * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.

 * Remove comment with link to NtFlushBuffersFileEx documentation.

 * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
   contents, but now this uses test_tick, so it should be deterministic from
   run to run.

 * Ensure there are tests for all of the modified commands. Make the
   unpack-objects tests validate that the unpacked objects are really
   available in the ODB.

References for v4: [v4.1]
https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/#t

[v4.2]
https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117

Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Neeraj Singh (7):
  object-file.c: do not rename in a temp odb
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt |  29 +++++++--
 Makefile                      |   6 ++
 builtin/add.c                 |   1 +
 builtin/unpack-objects.c      |   3 +
 builtin/update-index.c        |   6 ++
 bulk-checkin.c                |  92 +++++++++++++++++++++++---
 bulk-checkin.h                |   2 +
 cache.h                       |   8 ++-
 config.c                      |   7 +-
 config.mak.uname              |   1 +
 configure.ac                  |   8 +++
 environment.c                 |   6 +-
 git-compat-util.h             |   7 ++
 object-file.c                 | 118 +++++++++++++++++++++++++++++-----
 object-store.h                |  22 +++++++
 object.c                      |   2 +-
 repository.c                  |   2 +
 repository.h                  |   1 +
 t/lib-unique-files.sh         |  36 +++++++++++
 t/perf/p3700-add.sh           |  43 +++++++++++++
 t/perf/p3900-stash.sh         |  46 +++++++++++++
 t/t3700-add.sh                |  20 ++++++
 t/t3903-stash.sh              |  14 ++++
 t/t5300-pack-object.sh        |  30 +++++----
 tmp-objdir.c                  |  20 +++++-
 tmp-objdir.h                  |   6 ++
 wrapper.c                     |  44 +++++++++++++
 write-or-die.c                |   2 +-
 28 files changed, 532 insertions(+), 50 deletions(-)
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v5
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v4:

 -:  ----------- > 1:  95315f35a28 object-file.c: do not rename in a temp odb
 1:  d5893e28df1 = 2:  df6fab94d67 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 2:  12cad737635 ! 3:  fe19cdfc930 core.fsyncobjectfiles: batched disk flushes
     @@ Commit message
      
          One major source of the cost of fsync is the implied flush of the
          hardware writeback cache within the disk drive. Fortunately, Windows,
     -    macOS, and Linux each offer mechanisms to write data from the filesystem
     -    page cache without initiating a hardware flush.
     +    and macOS offer mechanisms to write data from the filesystem page cache
     +    without initiating a hardware flush. Linux has the sync_file_range API,
     +    which issues a pagecache writeback request reliably after version 5.2.
      
          This patch introduces a new 'core.fsyncObjectFiles = batch' option that
     -    takes advantage of the bulk-checkin infrastructure to batch up hardware
     -    flushes.
     +    batches up hardware flushes. It hooks into the bulk-checkin plugging and
     +    unplugging functionality and takes advantage of tmp-objdir.
      
     -    When the new mode is enabled we do the following for new objects:
     -
     -    1. Create a tmp_obj_XXXX file and write the object data to it.
     +    When the new mode is enabled we do the following for each new object:
     +    1. Create the object in a tmp-objdir.
          2. Issue a pagecache writeback request and wait for it to complete.
     -    3. Record the tmp name and the final name in the bulk-checkin state for
     -       later rename.
      
     -    At the end of the entire transaction we:
     -    1. Issue a fsync against the lock file to flush the hardware writeback
     -       cache, which should by now have processed the tmp file writes.
     -    2. Rename all of the temp files to their final names.
     +    At the end of the entire transaction when unplugging bulk checkin we:
     +    1. Issue an fsync against a dummy file to flush the hardware writeback
     +       cache, which should by now have processed the tmp-objdir writes.
     +    2. Rename all of the tmp-objdir files to their final names.
          3. When updating the index and/or refs, we assume that Git will issue
     -       another fsync internal to that operation.
     +       another fsync internal to that operation. This is not the case today,
     +       but may be a good extension to those components.
      
          On a filesystem with a singular journal that is updated during name
     -    operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
     +    operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we
          would expect the fsync to trigger a journal writeout so that this
          sequence is enough to ensure that the user's data is durable by the time
          the git command returns.
     @@ Documentation/config/core.txt: core.whitespace::
      +	A value indicating the level of effort Git will expend in
      +	trying to make objects added to the repo durable in the event
      +	of an unclean system shutdown. This setting currently only
     -+	controls the object store, so updates to any refs or the
     -+	index may not be equally durable.
     ++	controls loose objects in the object store, so updates to any
     ++	refs or the index may not be equally durable.
      ++
      +* `false` allows data to remain in file system caches according to
      +  operating system policy, whence it may be lost if the system loses power
      +  or crashes.
     -+* `true` triggers a data integrity flush for each object added to the
     ++* `true` triggers a data integrity flush for each loose object added to the
      +  object store. This is the safest setting that is likely to ensure durability
      +  across all operating systems and file systems that honor the 'fsync' system
      +  call. However, this setting comes with a significant performance cost on
     -+  common hardware.
     ++  common hardware. Git does not currently fsync parent directories for
     ++  newly-added files, so some filesystems may still allow data to be lost on
     ++  system crash.
      +* `batch` enables an experimental mode that uses interfaces available in some
     -+  operating systems to write object data with a minimal set of FLUSH CACHE
     -+  (or equivalent) commands sent to the storage controller. If the operating
     -+  system interfaces are not available, this mode behaves the same as `true`.
     -+  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
     -+  filesystems and on Windows for repos stored on NTFS or ReFS.
     ++  operating systems to write loose object data with a minimal set of FLUSH
     ++  CACHE (or equivalent) commands sent to the storage controller. If the
     ++  operating system interfaces are not available, this mode behaves the same as
     ++  `true`. This mode is expected to be as safe as `true` on macOS for repos
     ++  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
     ++  ReFS.
       
       core.preloadIndex::
       	Enable parallel index preload for operations like 'git diff'
     @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
       
       	if (chmod_arg && pathspec.nr)
       		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
     --	unplug_bulk_checkin();
      +
     -+	unplug_bulk_checkin(&lock_file);
     + 	unplug_bulk_checkin();
       
       finish:
     - 	if (write_locked_index(&the_index, &lock_file,
      
       ## bulk-checkin.c ##
      @@
     @@ bulk-checkin.c
       #include "pack.h"
       #include "strbuf.h"
      +#include "string-list.h"
     ++#include "tmp-objdir.h"
       #include "packfile.h"
       #include "object-store.h"
       
       static int bulk_checkin_plugged;
     - 
     -+static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
     ++static int needs_batch_fsync;
      +
     ++static struct tmp_objdir *bulk_fsync_objdir;
     + 
       static struct bulk_checkin_state {
       	char *pack_tmp_name;
     - 	struct hashfile *f;
      @@ bulk-checkin.c: clear_exit:
       	reprepare_packed_git(the_repository);
       }
       
     -+static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
     ++/*
     ++ * Cleanup after batch-mode fsync_object_files.
     ++ */
     ++static void do_batch_fsync(void)
      +{
     -+	if (fsync_state->nr) {
     -+		struct string_list_item *rename;
     -+
     -+		/*
     -+		 * Issue a full hardware flush against the lock file to ensure
     -+		 * that all objects are durable before any renames occur.
     -+		 * The code in fsync_and_close_loose_object_bulk_checkin has
     -+		 * already ensured that writeout has occurred, but it has not
     -+		 * flushed any writeback cache in the storage hardware.
     -+		 */
     -+		fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
     -+
     -+		for_each_string_list_item(rename, fsync_state) {
     -+			const char *src = rename->string;
     -+			const char *dst = rename->util;
     -+
     -+			if (finalize_object_file(src, dst))
     -+				die_errno(_("could not rename '%s' to '%s'"), src, dst);
     -+		}
     -+
     -+		string_list_clear(fsync_state, 1);
     ++	/*
     ++	 * Issue a full hardware flush against a temporary file to ensure
     ++	 * that all objects are durable before any renames occur.  The code in
     ++	 * fsync_loose_object_bulk_checkin has already issued a writeout
     ++	 * request, but it has not flushed any writeback cache in the storage
     ++	 * hardware.
     ++	 */
     ++
     ++	if (needs_batch_fsync) {
     ++		struct strbuf temp_path = STRBUF_INIT;
     ++		struct tempfile *temp;
     ++
     ++		strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
     ++		temp = xmks_tempfile(temp_path.buf);
     ++		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
     ++		delete_tempfile(&temp);
     ++		strbuf_release(&temp_path);
      +	}
     ++
     ++	if (bulk_fsync_objdir)
     ++		tmp_objdir_migrate(bulk_fsync_objdir);
      +}
      +
       static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
       	return 0;
       }
       
     -+static void add_rename_bulk_checkin(struct string_list *fsync_state,
     -+				    const char *src, const char *dst)
     ++void fsync_loose_object_bulk_checkin(int fd)
      +{
     -+	string_list_insert(fsync_state, src)->util = xstrdup(dst);
     -+}
     -+
     -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
     -+					      const char *filename, time_t mtime)
     -+{
     -+	int do_finalize = 1;
     -+	int ret = 0;
     -+
     -+	if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
     -+		/*
     -+		 * If we have a plugged bulk checkin, we issue a call that
     -+		 * cleans the filesystem page cache but avoids a hardware flush
     -+		 * command. Later on we will issue a single hardware flush
     -+		 * before renaming files as part of do_sync_and_rename.
     -+		 */
     -+		if (bulk_checkin_plugged &&
     -+		    fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
     -+		    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
     -+			add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
     -+			do_finalize = 0;
     -+
     -+		} else {
     -+			fsync_or_die(fd, "loose object file");
     -+		}
     -+	}
     -+
     -+	if (close(fd))
     -+		die_errno(_("error when closing loose object file"));
     -+
     -+	if (mtime) {
     -+		struct utimbuf utb;
     -+		utb.actime = mtime;
     -+		utb.modtime = mtime;
     -+		if (utime(tmpfile, &utb) < 0)
     -+			warning_errno(_("failed utime() on %s"), tmpfile);
     ++	assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
     ++
     ++	/*
     ++	 * If we have a plugged bulk checkin, we issue a call that
     ++	 * cleans the filesystem page cache but avoids a hardware flush
     ++	 * command. Later on we will issue a single hardware flush
     ++	 * before as part of do_batch_fsync.
     ++	 */
     ++	if (bulk_checkin_plugged &&
     ++	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
     ++		assert(the_repository->objects->odb->is_temp);
     ++		if (!needs_batch_fsync)
     ++			needs_batch_fsync = 1;
     ++	} else {
     ++		fsync_or_die(fd, "loose object file");
      +	}
     -+
     -+	if (do_finalize)
     -+		ret = finalize_object_file(tmpfile, filename);
     -+
     -+	return ret;
      +}
      +
       int index_bulk_checkin(struct object_id *oid,
       		       int fd, size_t size, enum object_type type,
       		       const char *path, unsigned flags)
     -@@ bulk-checkin.c: void plug_bulk_checkin(void)
     +@@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
     + void plug_bulk_checkin(void)
     + {
     + 	assert(!bulk_checkin_plugged);
     ++
     ++	/*
     ++	 * Create a temporary object directory if the current
     ++	 * object directory is not already temporary.
     ++	 */
     ++	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
     ++	    !the_repository->objects->odb->is_temp) {
     ++		bulk_fsync_objdir = tmp_objdir_create();
     ++		if (!bulk_fsync_objdir)
     ++			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
     ++
     ++		tmp_objdir_replace_main_odb(bulk_fsync_objdir);
     ++	}
     ++
       	bulk_checkin_plugged = 1;
       }
       
     --void unplug_bulk_checkin(void)
     -+void unplug_bulk_checkin(struct lock_file *lock_file)
     - {
     - 	assert(bulk_checkin_plugged);
     +@@ bulk-checkin.c: void unplug_bulk_checkin(void)
       	bulk_checkin_plugged = 0;
       	if (bulk_checkin_state.f)
       		finish_bulk_checkin(&bulk_checkin_state);
      +
     -+	do_sync_and_rename(&bulk_fsync_state, lock_file);
     ++	do_batch_fsync();
       }
      
       ## bulk-checkin.h ##
     @@ bulk-checkin.h
       
       #include "cache.h"
       
     -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
     -+					      const char *filename, time_t mtime);
     ++void fsync_loose_object_bulk_checkin(int fd);
      +
       int index_bulk_checkin(struct object_id *oid,
       		       int fd, size_t size, enum object_type type,
       		       const char *path, unsigned flags);
     - 
     - void plug_bulk_checkin(void);
     --void unplug_bulk_checkin(void);
     -+void unplug_bulk_checkin(struct lock_file *);
     - 
     - #endif
      
       ## cache.h ##
      @@ cache.h: void reset_shared_repository(void);
     @@ cache.h: void reset_shared_repository(void);
       extern char *git_replace_ref_base;
       
      -extern int fsync_object_files;
     -+enum FSYNC_OBJECT_FILES_MODE {
     ++enum fsync_object_files_mode {
      +    FSYNC_OBJECT_FILES_OFF,
      +    FSYNC_OBJECT_FILES_ON,
      +    FSYNC_OBJECT_FILES_BATCH
      +};
      +
     -+extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
     ++extern enum fsync_object_files_mode fsync_object_files;
       extern int core_preload_index;
       extern int precomposed_unicode;
       extern int protect_hfs;
     @@ environment.c: const char *git_hooks_path;
       int core_compression_level;
       int pack_compression_level = Z_DEFAULT_COMPRESSION;
      -int fsync_object_files;
     -+enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
     ++enum fsync_object_files_mode fsync_object_files;
       size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
       size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
       size_t delta_base_cache_limit = 96 * 1024 * 1024;
     @@ git-compat-util.h: __attribute__((format (printf, 1, 2))) NORETURN
        * Returns 0 on success, which includes trying to unlink an object that does
      
       ## object-file.c ##
     -@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
     - 	return 0;
     +@@ object-file.c: void add_to_alternates_memory(const char *reference)
     + 			     '\n', NULL, 0);
       }
       
     --/* Finalize a file on disk, and close it. */
     --static void close_loose_object(int fd)
     --{
     ++struct object_directory *set_temporary_main_odb(const char *dir)
     ++{
     ++	struct object_directory *main_odb, *new_odb, *old_next;
     ++
     ++	/*
     ++	 * Make sure alternates are initialized, or else our entry may be
     ++	 * overwritten when they are.
     ++	 */
     ++	prepare_alt_odb(the_repository);
     ++
     ++	/* Copy the existing object directory and make it an alternate. */
     ++	main_odb = the_repository->objects->odb;
     ++	new_odb = xmalloc(sizeof(*new_odb));
     ++	*new_odb = *main_odb;
     ++	*the_repository->objects->odb_tail = new_odb;
     ++	the_repository->objects->odb_tail = &(new_odb->next);
     ++	new_odb->next = NULL;
     ++
     ++	/*
     ++	 * Reinitialize the main odb with the specified path, being careful
     ++	 * to keep the next pointer value.
     ++	 */
     ++	old_next = main_odb->next;
     ++	memset(main_odb, 0, sizeof(*main_odb));
     ++	main_odb->next = old_next;
     ++	main_odb->is_temp = 1;
     ++	main_odb->path = xstrdup(dir);
     ++	return new_odb;
     ++}
     ++
     ++void restore_main_odb(struct object_directory *odb)
     ++{
     ++	struct object_directory **prev, *main_odb;
     ++
     ++	/* Unlink the saved previous main ODB from the list. */
     ++	prev = &the_repository->objects->odb->next;
     ++	assert(*prev);
     ++	while (*prev != odb) {
     ++		prev = &(*prev)->next;
     ++	}
     ++	*prev = odb->next;
     ++	if (*prev == NULL)
     ++		the_repository->objects->odb_tail = prev;
     ++
     ++	/*
     ++	 * Restore the data from the old main odb, being careful to
     ++	 * keep the next pointer value
     ++	 */
     ++	main_odb = the_repository->objects->odb;
     ++	SWAP(*main_odb, *odb);
     ++	main_odb->next = odb->next;
     ++	free_object_directory(odb);
     ++}
     ++
     + /*
     +  * Compute the exact path an alternate is at and returns it. In case of
     +  * error NULL is returned and the human readable error is added to `err`
     +@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
     + /* Finalize a file on disk, and close it. */
     + static void close_loose_object(int fd)
     + {
      -	if (fsync_object_files)
     --		fsync_or_die(fd, "loose object file");
     --	if (close(fd) != 0)
     --		die_errno(_("error when closing loose object file"));
     --}
     --
     - /* Size of directory component, including the ending '/' */
     - static inline int directory_size(const char *filename)
     ++	switch (fsync_object_files) {
     ++	case FSYNC_OBJECT_FILES_OFF:
     ++		break;
     ++	case FSYNC_OBJECT_FILES_ON:
     + 		fsync_or_die(fd, "loose object file");
     ++		break;
     ++	case FSYNC_OBJECT_FILES_BATCH:
     ++		fsync_loose_object_bulk_checkin(fd);
     ++		break;
     ++	default:
     ++		BUG("Invalid fsync_object_files mode.");
     ++	}
     ++
     + 	if (close(fd) != 0)
     + 		die_errno(_("error when closing loose object file"));
     + }
     +
     + ## object-store.h ##
     +@@ object-store.h: void add_to_alternates_file(const char *dir);
     +  */
     + void add_to_alternates_memory(const char *dir);
     + 
     ++/*
     ++ * Replace the current main object directory with the specified temporary
     ++ * object directory. We make a copy of the former main object directory,
     ++ * add it as an in-memory alternate, and return the copy so that it can
     ++ * be restored via restore_main_odb.
     ++ */
     ++struct object_directory *set_temporary_main_odb(const char *dir);
     ++
     ++/*
     ++ * Restore a previous ODB replaced by set_temporary_main_odb.
     ++ */
     ++void restore_main_odb(struct object_directory *odb);
     ++
     + /*
     +  * Populate and return the loose object cache array corresponding to the
     +  * given object ID.
     +@@ object-store.h: struct oidtree *odb_loose_cache(struct object_directory *odb,
     + /* Empty the loose object cache for the specified object directory. */
     + void odb_clear_loose_cache(struct object_directory *odb);
     + 
     ++/* Clear and free the specified object directory */
     ++void free_object_directory(struct object_directory *odb);
     ++
     + struct packed_git {
     + 	struct hashmap_entry packmap_ent;
     + 	struct packed_git *next;
     +
     + ## object.c ##
     +@@ object.c: struct raw_object_store *raw_object_store_new(void)
     + 	return o;
     + }
     + 
     +-static void free_object_directory(struct object_directory *odb)
     ++void free_object_directory(struct object_directory *odb)
       {
     -@@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
     - 		die(_("confused by unstable object source data for %s"),
     - 		    oid_to_hex(oid));
     + 	free(odb->path);
     + 	odb_clear_loose_cache(odb);
     +
     + ## tmp-objdir.c ##
     +@@
     + struct tmp_objdir {
     + 	struct strbuf path;
     + 	struct strvec env;
     ++	struct object_directory *prev_main_odb;
     + };
       
     --	close_loose_object(fd);
     --
     --	if (mtime) {
     --		struct utimbuf utb;
     --		utb.actime = mtime;
     --		utb.modtime = mtime;
     --		if (utime(tmp_file.buf, &utb) < 0)
     --			warning_errno(_("failed utime() on %s"), tmp_file.buf);
     --	}
     --
     --	return finalize_object_file(tmp_file.buf, filename.buf);
     -+	return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
     -+							 filename.buf, mtime);
     + /*
     +@@ tmp-objdir.c: static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
     + 	 * freeing memory; it may cause a deadlock if the signal
     + 	 * arrived while libc's allocator lock is held.
     + 	 */
     +-	if (!on_signal)
     ++	if (!on_signal) {
     ++		if (t->prev_main_odb)
     ++			restore_main_odb(t->prev_main_odb);
     + 		tmp_objdir_free(t);
     ++	}
     ++
     + 	return err;
       }
       
     - static int freshen_loose_object(const struct object_id *oid)
     +@@ tmp-objdir.c: struct tmp_objdir *tmp_objdir_create(void)
     + 	t = xmalloc(sizeof(*t));
     + 	strbuf_init(&t->path, 0);
     + 	strvec_init(&t->env);
     ++	t->prev_main_odb = NULL;
     + 
     + 	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
     + 
     +@@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
     + 	if (!t)
     + 		return 0;
     + 
     ++	if (t->prev_main_odb) {
     ++		restore_main_odb(t->prev_main_odb);
     ++		t->prev_main_odb = NULL;
     ++	}
     ++
     + 	strbuf_addbuf(&src, &t->path);
     + 	strbuf_addstr(&dst, get_object_directory());
     + 
     +@@ tmp-objdir.c: void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
     + {
     + 	add_to_alternates_memory(t->path.buf);
     + }
     ++
     ++void tmp_objdir_replace_main_odb(struct tmp_objdir *t)
     ++{
     ++	if (t->prev_main_odb)
     ++		BUG("the main object database is already replaced");
     ++	t->prev_main_odb = set_temporary_main_odb(t->path.buf);
     ++}
     +
     + ## tmp-objdir.h ##
     +@@ tmp-objdir.h: int tmp_objdir_destroy(struct tmp_objdir *);
     +  */
     + void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
     + 
     ++/*
     ++ * Replaces the main object store in the current process with the temporary
     ++ * object directory and makes the former main object store an alternate.
     ++ */
     ++void tmp_objdir_replace_main_odb(struct tmp_objdir *);
     ++
     + #endif /* TMP_OBJDIR_H */
      
       ## wrapper.c ##
      @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
 3:  a5b3e21b762 < -:  ----------- core.fsyncobjectfiles: add windows support for batch mode
 4:  f7f756f3932 ! 4:  485b4a767df update-index: use the bulk-checkin infrastructure
     @@ Commit message
          There is some risk with this change, since under batch fsync, the object
          files will not be available until the update-index is entirely complete.
          This usage is unlikely, since any tool invoking update-index and
     -    expecting to see objects would have to snoop the output of --verbose to
     -    find out when update-index has actually processed a given path.
     -    Additionally the index is locked for the duration of the update.
     +    expecting to see objects would have to synchronize with the update-index
     +    process after passing it a file path.
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     @@ builtin/update-index.c
       #include "lockfile.h"
       #include "quote.h"
      @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     - 		struct strbuf unquoted = STRBUF_INIT;
       
     - 		setup_work_tree();
     -+		plug_bulk_checkin();
     - 		while (getline_fn(&buf, stdin) != EOF) {
     - 			char *p;
     - 			if (!nul_term_line && buf.buf[0] == '"') {
     + 	the_index.updated_skipworktree = 1;
     + 
     ++	/* we might be adding many objects to the object database */
     ++	plug_bulk_checkin();
     ++
     + 	/*
     + 	 * Custom copy of parse_options() because we want to handle
     + 	 * filename arguments as they come.
      @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
     - 				chmod_path(set_executable_bit, p);
     - 			free(p);
     - 		}
     -+		unplug_bulk_checkin(&lock_file);
     - 		strbuf_release(&unquoted);
       		strbuf_release(&buf);
       	}
     + 
     ++	/* by now we must have added all of the new objects */
     ++	unplug_bulk_checkin();
     + 	if (split_index > 0) {
     + 		if (git_config_get_split_index() == 0)
     + 			warning(_("core.splitIndex is set to false; "
 -:  ----------- > 5:  889e7668760 unpack-objects: use the bulk-checkin infrastructure
 5:  afb0028e796 ! 6:  0f2e3b25759 core.fsyncobjectfiles: tests for batch mode
     @@ Metadata
       ## Commit message ##
          core.fsyncobjectfiles: tests for batch mode
      
     -    Add test cases to exercise batch mode for 'git add'
     -    and 'git stash'. These tests ensure that the added
     -    data winds up in the object database.
     +    Add test cases to exercise batch mode for:
     +     * 'git add'
     +     * 'git stash'
     +     * 'git update-index'
     +     * 'git unpack-objects'
      
     -    I verified the tests by introducing an incorrect rename
     -    in do_sync_and_rename.
     +    These tests ensure that the added data winds up in the object database.
     +
     +    In this change we introduce a new test helper lib-unique-files.sh. The
     +    goal of this library is to create a tree of files that have different
     +    oids from any other files that may have been created in the current test
     +    repo. This helps us avoid missing validation of an object being added due
     +    to it already being in the repo.
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     @@ t/lib-unique-files.sh (new)
      @@
      +# Helper to create files with unique contents
      +
     -+test_create_unique_files_base__=$(date -u)
     -+test_create_unique_files_counter__=0
      +
      +# Create multiple files with unique contents. Takes the number of
      +# directories, the number of files in each directory, and the base
      +# directory.
      +#
     -+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
     -+#				    each in the specified directory, all
     -+#				    with unique contents.
     ++# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
     ++#					 each in my_dir, all with unique
     ++#					 contents.
      +
      +test_create_unique_files() {
      +	test "$#" -ne 3 && BUG "3 param"
     @@ t/lib-unique-files.sh (new)
      +	local dirs=$1
      +	local files=$2
      +	local basedir=$3
     ++	local counter=0
     ++	test_tick
     ++	local basedata=$test_tick
     ++
      +
     -+	rm -rf $basedir >/dev/null
     ++	rm -rf $basedir
      +
      +	for i in $(test_seq $dirs)
      +	do
      +		local dir=$basedir/dir$i
      +
     -+		mkdir -p "$dir" > /dev/null
     ++		mkdir -p "$dir"
      +		for j in $(test_seq $files)
      +		do
     -+			test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
     -+			echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
     ++			counter=$((counter + 1))
     ++			echo "$basedata.$counter"  >"$dir/file$j.txt"
      +		done
      +	done
      +}
     @@ t/t3700-add.sh: test_expect_success \
      +	rm -f fsynced_files &&
      +	git ls-files --stage fsync-files/ > fsynced_files &&
      +	test_line_count = 8 fsynced_files &&
     -+	cat fsynced_files | awk '{print \$2}' | xargs -n1 git cat-file -e
     ++	awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
     ++"
     ++
     ++test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
     ++	test_create_unique_files 2 4 fsync-files2 &&
     ++	find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
     ++	rm -f fsynced_files2 &&
     ++	git ls-files --stage fsync-files2/ > fsynced_files2 &&
     ++	test_line_count = 8 fsynced_files2 &&
     ++	awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
      +"
      +
       test_expect_success \
     @@ t/t3903-stash.sh: test_expect_success 'stash handles skip-worktree entries nicel
      +	# which contains the untracked files
      +	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
      +	test_line_count = 8 fsynced_files &&
     -+	cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
     ++	awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
      +"
      +
      +
       test_expect_success 'stash -c stash.useBuiltin=false warning ' '
       	expected="stash.useBuiltin support has been removed" &&
       
     +
     + ## t/t5300-pack-object.sh ##
     +@@ t/t5300-pack-object.sh: test_expect_success 'pack-objects with bogus arguments' '
     + 
     + check_unpack () {
     + 	test_when_finished "rm -rf git2" &&
     +-	git init --bare git2 &&
     +-	git -C git2 unpack-objects -n <"$1".pack &&
     +-	git -C git2 unpack-objects <"$1".pack &&
     +-	(cd .git && find objects -type f -print) |
     +-	while read path
     +-	do
     +-		cmp git2/$path .git/$path || {
     +-			echo $path differs.
     +-			return 1
     +-		}
     +-	done
     ++	git $2 init --bare git2 &&
     ++	(
     ++		git $2 -C git2 unpack-objects -n <"$1".pack &&
     ++		git $2 -C git2 unpack-objects <"$1".pack &&
     ++		git $2 -C git2 cat-file --batch-check="%(objectname)"
     ++	) <obj-list >current &&
     ++	cmp obj-list current
     + }
     + 
     + test_expect_success 'unpack without delta' '
     + 	check_unpack test-1-${packname_1}
     + '
     + 
     ++test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
     ++	check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
     ++'
     ++
     + test_expect_success 'pack with REF_DELTA' '
     + 	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
     + 	check_deltas stderr -gt 0
     +@@ t/t5300-pack-object.sh: test_expect_success 'unpack with REF_DELTA' '
     + 	check_unpack test-2-${packname_2}
     + '
     + 
     ++test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
     ++       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
     ++'
     ++
     + test_expect_success 'pack with OFS_DELTA' '
     + 	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
     + 			<obj-list 2>stderr) &&
     +@@ t/t5300-pack-object.sh: test_expect_success 'unpack with OFS_DELTA' '
     + 	check_unpack test-3-${packname_3}
     + '
     + 
     ++test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
     ++       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
     ++'
     ++
     + test_expect_success 'compare delta flavors' '
     + 	perl -e '\''
     + 		defined($_ = -s $_) or die for @ARGV;
 6:  3e6b80b5fa2 = 7:  6543564376a core.fsyncobjectfiles: performance tests for add and stash

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 1/7] object-file.c: do not rename in a temp odb
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                           ` (7 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH
being set, create object files with their final names. This avoids
an extra rename beyond what is needed to merge the temporary ODB in
tmp_objdir_migrate.

Creating an object file with the expected final name should be okay
since the git process writing to the temporary object store is the
only writer, and it only invokes write_loose_object/create_object_file
after checking that the object doesn't exist.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 environment.c  |  4 ++++
 object-file.c  | 51 ++++++++++++++++++++++++++++++++++----------------
 object-store.h |  6 ++++++
 repository.c   |  2 ++
 repository.h   |  1 +
 5 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/environment.c b/environment.c
index d6b22ede7ea..d9ba68402e9 100644
--- a/environment.c
+++ b/environment.c
@@ -177,6 +177,10 @@ void setup_git_env(const char *git_dir)
 	args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT);
 	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
 	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
+	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+		args.object_dir_is_temp = 1;
+	}
+
 	repo_set_gitdir(the_repository, git_dir, &args);
 	strvec_clear(&to_free);
 
diff --git a/object-file.c b/object-file.c
index a8be8994814..ab593515cec 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1800,12 +1800,17 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
 }
 
 /*
- * Move the just written object into its final resting place.
+ * Move the just written object into its final resting place,
+ * unless it is already there, as indicated by an empty string for
+ * tmpfile.
  */
 int finalize_object_file(const char *tmpfile, const char *filename)
 {
 	int ret = 0;
 
+	if (!*tmpfile)
+		goto out;
+
 	if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
 		goto try_rename;
 	else if (link(tmpfile, filename))
@@ -1878,21 +1883,37 @@ static inline int directory_size(const char *filename)
 }
 
 /*
- * This creates a temporary file in the same directory as the final
- * 'filename'
+ * This creates a loose object file for the specified object id.
+ * If we're working in a temporary object directory, the file is
+ * created with its final filename, otherwise it is created with
+ * a temporary name and renamed by finalize_object_file.
+ * If no rename is required, an empty string is returned in tmp.
  *
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+static int create_objfile(const struct object_id *oid, struct strbuf *tmp,
+			  struct strbuf *filename)
 {
-	int fd, dirlen = directory_size(filename);
+	int fd, dirlen, is_retrying = 0;
+	const char *object_name;
+	static const int object_mode = 0444;
 
+	loose_object_path(the_repository, filename, oid);
+	dirlen = directory_size(filename->buf);
+
+retry_create:
 	strbuf_reset(tmp);
-	strbuf_add(tmp, filename, dirlen);
-	strbuf_addstr(tmp, "tmp_obj_XXXXXX");
-	fd = git_mkstemp_mode(tmp->buf, 0444);
-	if (fd < 0 && dirlen && errno == ENOENT) {
+	if (!the_repository->objects->odb->is_temp) {
+		strbuf_add(tmp, filename->buf, dirlen);
+		object_name = "tmp_obj_XXXXXX";
+		strbuf_addstr(tmp, object_name);
+		fd = git_mkstemp_mode(tmp->buf, object_mode);
+	} else {
+		fd = open(filename->buf, O_CREAT | O_EXCL | O_RDWR, object_mode);
+	}
+
+	if (fd < 0 && dirlen && errno == ENOENT && !is_retrying) {
 		/*
 		 * Make sure the directory exists; note that the contents
 		 * of the buffer are undefined after mkstemp returns an
@@ -1900,15 +1921,15 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename)
 		 * scratch.
 		 */
 		strbuf_reset(tmp);
-		strbuf_add(tmp, filename, dirlen - 1);
+		strbuf_add(tmp, filename->buf, dirlen - 1);
 		if (mkdir(tmp->buf, 0777) && errno != EEXIST)
 			return -1;
 		if (adjust_shared_perm(tmp->buf))
 			return -1;
 
 		/* Try again */
-		strbuf_addstr(tmp, "/tmp_obj_XXXXXX");
-		fd = git_mkstemp_mode(tmp->buf, 0444);
+		is_retrying = 1;
+		goto retry_create;
 	}
 	return fd;
 }
@@ -1925,14 +1946,12 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	static struct strbuf filename = STRBUF_INIT;
 
-	loose_object_path(the_repository, &filename, oid);
-
-	fd = create_tmpfile(&tmp_file, filename.buf);
+	fd = create_objfile(oid, &tmp_file, &filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory());
 		else
-			return error_errno(_("unable to create temporary file"));
+			return error_errno(_("unable to create object file"));
 	}
 
 	/* Set it up */
diff --git a/object-store.h b/object-store.h
index b4dc6668aa2..f8c883a5730 100644
--- a/object-store.h
+++ b/object-store.h
@@ -26,6 +26,12 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This is a temporary object store, so there is no need to
+	 * create new objects via rename.
+	 */
+	int is_temp;
+
 	/*
 	 * Path to the alternative object store. If this is a relative path,
 	 * it is relative to the current working directory.
diff --git a/repository.c b/repository.c
index b2bf44c6faf..a16de04dfa8 100644
--- a/repository.c
+++ b/repository.c
@@ -80,6 +80,8 @@ void repo_set_gitdir(struct repository *repo,
 	expand_base_dir(&repo->objects->odb->path, o->object_dir,
 			repo->commondir, "objects");
 
+	repo->objects->odb->is_temp = o->object_dir_is_temp;
+
 	free(repo->objects->alternate_db);
 	repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
 	expand_base_dir(&repo->graft_file, o->graft_file,
diff --git a/repository.h b/repository.h
index 3740c93bc0f..d3711367a6f 100644
--- a/repository.h
+++ b/repository.h
@@ -162,6 +162,7 @@ struct set_gitdir_args {
 	const char *graft_file;
 	const char *index_file;
 	const char *alternate_db;
+	int object_dir_is_temp;
 };
 
 void repo_set_gitdir(struct repository *repo, const char *root,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                           ` (6 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index b023d9959aa..f117d62c908 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
@@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 21:47           ` Neeraj Singh
  2021-09-24 20:12         ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                           ` (5 subsequent siblings)
  8 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.

When the new mode is enabled we do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin we:
1. Issue an fsync against a dummy file to flush the hardware writeback
   cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the case today,
   but may be a good extension to those components.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we
would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 29 ++++++++++++---
 Makefile                      |  6 +++
 builtin/add.c                 |  1 +
 bulk-checkin.c                | 70 +++++++++++++++++++++++++++++++++++
 bulk-checkin.h                |  2 +
 cache.h                       |  8 +++-
 config.c                      |  7 +++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 ++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 ++++
 object-file.c                 | 67 ++++++++++++++++++++++++++++++++-
 object-store.h                | 16 ++++++++
 object.c                      |  2 +-
 tmp-objdir.c                  | 20 +++++++++-
 tmp-objdir.h                  |  6 +++
 wrapper.c                     | 44 ++++++++++++++++++++++
 write-or-die.c                |  2 +-
 18 files changed, 285 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..200b4d9f06e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,29 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls loose objects in the object store, so updates to any
+	refs or the index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each loose object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware. Git does not currently fsync parent directories for
+  newly-added files, so some filesystems may still allow data to be lost on
+  system crash.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write loose object data with a minimal set of FLUSH
+  CACHE (or equivalent) commands sent to the storage controller. If the
+  operating system interfaces are not available, this mode behaves the same as
+  `true`. This mode is expected to be as safe as `true` on macOS for repos
+  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
+  ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 429c276058d..326c7607e0f 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/builtin/add.c b/builtin/add.c
index 2244311d485..9d9897cf037 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -678,6 +678,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
 	if (chmod_arg && pathspec.nr)
 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
+
 	unplug_bulk_checkin();
 
 finish:
diff --git a/bulk-checkin.c b/bulk-checkin.c
index f117d62c908..957a6238684 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,14 +3,20 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
+#include "tmp-objdir.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
+static int needs_batch_fsync;
+
+static struct tmp_objdir *bulk_fsync_objdir;
 
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
@@ -62,6 +68,34 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+/*
+ * Cleanup after batch-mode fsync_object_files.
+ */
+static void do_batch_fsync(void)
+{
+	/*
+	 * Issue a full hardware flush against a temporary file to ensure
+	 * that all objects are durable before any renames occur.  The code in
+	 * fsync_loose_object_bulk_checkin has already issued a writeout
+	 * request, but it has not flushed any writeback cache in the storage
+	 * hardware.
+	 */
+
+	if (needs_batch_fsync) {
+		struct strbuf temp_path = STRBUF_INIT;
+		struct tempfile *temp;
+
+		strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
+		temp = xmks_tempfile(temp_path.buf);
+		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
+		delete_tempfile(&temp);
+		strbuf_release(&temp_path);
+	}
+
+	if (bulk_fsync_objdir)
+		tmp_objdir_migrate(bulk_fsync_objdir);
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -256,6 +290,26 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+void fsync_loose_object_bulk_checkin(int fd)
+{
+	assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
+
+	/*
+	 * If we have a plugged bulk checkin, we issue a call that
+	 * cleans the filesystem page cache but avoids a hardware flush
+	 * command. Later on we will issue a single hardware flush
+	 * before as part of do_batch_fsync.
+	 */
+	if (bulk_checkin_plugged &&
+	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+		assert(the_repository->objects->odb->is_temp);
+		if (!needs_batch_fsync)
+			needs_batch_fsync = 1;
+	} else {
+		fsync_or_die(fd, "loose object file");
+	}
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -270,6 +324,20 @@ int index_bulk_checkin(struct object_id *oid,
 void plug_bulk_checkin(void)
 {
 	assert(!bulk_checkin_plugged);
+
+	/*
+	 * Create a temporary object directory if the current
+	 * object directory is not already temporary.
+	 */
+	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
+	    !the_repository->objects->odb->is_temp) {
+		bulk_fsync_objdir = tmp_objdir_create();
+		if (!bulk_fsync_objdir)
+			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
+
+		tmp_objdir_replace_main_odb(bulk_fsync_objdir);
+	}
+
 	bulk_checkin_plugged = 1;
 }
 
@@ -279,4 +347,6 @@ void unplug_bulk_checkin(void)
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_batch_fsync();
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..08f292379b6 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,6 +6,8 @@
 
 #include "cache.h"
 
+void fsync_loose_object_bulk_checkin(int fd);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
diff --git a/cache.h b/cache.h
index d23de693680..d1897fe9d92 100644
--- a/cache.h
+++ b/cache.h
@@ -985,7 +985,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum fsync_object_files_mode {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum fsync_object_files_mode fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index cb4a8058bff..1b403e00241 100644
--- a/config.c
+++ b/config.c
@@ -1509,7 +1509,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (value && !strcmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else if (git_config_bool(var, value))
+			fsync_object_files = FSYNC_OBJECT_FILES_ON;
+		else
+			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..e6d482fbcc6 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index d9ba68402e9..f318d59e585 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum fsync_object_files_mode fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index b46605300ab..d14e2436276 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index ab593515cec..ec22560dd66 100644
--- a/object-file.c
+++ b/object-file.c
@@ -750,6 +750,60 @@ void add_to_alternates_memory(const char *reference)
 			     '\n', NULL, 0);
 }
 
+struct object_directory *set_temporary_main_odb(const char *dir)
+{
+	struct object_directory *main_odb, *new_odb, *old_next;
+
+	/*
+	 * Make sure alternates are initialized, or else our entry may be
+	 * overwritten when they are.
+	 */
+	prepare_alt_odb(the_repository);
+
+	/* Copy the existing object directory and make it an alternate. */
+	main_odb = the_repository->objects->odb;
+	new_odb = xmalloc(sizeof(*new_odb));
+	*new_odb = *main_odb;
+	*the_repository->objects->odb_tail = new_odb;
+	the_repository->objects->odb_tail = &(new_odb->next);
+	new_odb->next = NULL;
+
+	/*
+	 * Reinitialize the main odb with the specified path, being careful
+	 * to keep the next pointer value.
+	 */
+	old_next = main_odb->next;
+	memset(main_odb, 0, sizeof(*main_odb));
+	main_odb->next = old_next;
+	main_odb->is_temp = 1;
+	main_odb->path = xstrdup(dir);
+	return new_odb;
+}
+
+void restore_main_odb(struct object_directory *odb)
+{
+	struct object_directory **prev, *main_odb;
+
+	/* Unlink the saved previous main ODB from the list. */
+	prev = &the_repository->objects->odb->next;
+	assert(*prev);
+	while (*prev != odb) {
+		prev = &(*prev)->next;
+	}
+	*prev = odb->next;
+	if (*prev == NULL)
+		the_repository->objects->odb_tail = prev;
+
+	/*
+	 * Restore the data from the old main odb, being careful to
+	 * keep the next pointer value
+	 */
+	main_odb = the_repository->objects->odb;
+	SWAP(*main_odb, *odb);
+	main_odb->next = odb->next;
+	free_object_directory(odb);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
@@ -1867,8 +1921,19 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 /* Finalize a file on disk, and close it. */
 static void close_loose_object(int fd)
 {
-	if (fsync_object_files)
+	switch (fsync_object_files) {
+	case FSYNC_OBJECT_FILES_OFF:
+		break;
+	case FSYNC_OBJECT_FILES_ON:
 		fsync_or_die(fd, "loose object file");
+		break;
+	case FSYNC_OBJECT_FILES_BATCH:
+		fsync_loose_object_bulk_checkin(fd);
+		break;
+	default:
+		BUG("Invalid fsync_object_files mode.");
+	}
+
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
 }
diff --git a/object-store.h b/object-store.h
index f8c883a5730..9bea14e7f3b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -62,6 +62,19 @@ void add_to_alternates_file(const char *dir);
  */
 void add_to_alternates_memory(const char *dir);
 
+/*
+ * Replace the current main object directory with the specified temporary
+ * object directory. We make a copy of the former main object directory,
+ * add it as an in-memory alternate, and return the copy so that it can
+ * be restored via restore_main_odb.
+ */
+struct object_directory *set_temporary_main_odb(const char *dir);
+
+/*
+ * Restore a previous ODB replaced by set_temporary_main_odb.
+ */
+void restore_main_odb(struct object_directory *odb);
+
 /*
  * Populate and return the loose object cache array corresponding to the
  * given object ID.
@@ -72,6 +85,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
 /* Empty the loose object cache for the specified object directory. */
 void odb_clear_loose_cache(struct object_directory *odb);
 
+/* Clear and free the specified object directory */
+void free_object_directory(struct object_directory *odb);
+
 struct packed_git {
 	struct hashmap_entry packmap_ent;
 	struct packed_git *next;
diff --git a/object.c b/object.c
index 4e85955a941..98635bc4043 100644
--- a/object.c
+++ b/object.c
@@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
 	return o;
 }
 
-static void free_object_directory(struct object_directory *odb)
+void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index b8d880e3626..f027c49db4c 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -11,6 +11,7 @@
 struct tmp_objdir {
 	struct strbuf path;
 	struct strvec env;
+	struct object_directory *prev_main_odb;
 };
 
 /*
@@ -50,8 +51,12 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	 * freeing memory; it may cause a deadlock if the signal
 	 * arrived while libc's allocator lock is held.
 	 */
-	if (!on_signal)
+	if (!on_signal) {
+		if (t->prev_main_odb)
+			restore_main_odb(t->prev_main_odb);
 		tmp_objdir_free(t);
+	}
+
 	return err;
 }
 
@@ -132,6 +137,7 @@ struct tmp_objdir *tmp_objdir_create(void)
 	t = xmalloc(sizeof(*t));
 	strbuf_init(&t->path, 0);
 	strvec_init(&t->env);
+	t->prev_main_odb = NULL;
 
 	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
 
@@ -269,6 +275,11 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	if (!t)
 		return 0;
 
+	if (t->prev_main_odb) {
+		restore_main_odb(t->prev_main_odb);
+		t->prev_main_odb = NULL;
+	}
+
 	strbuf_addbuf(&src, &t->path);
 	strbuf_addstr(&dst, get_object_directory());
 
@@ -292,3 +303,10 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
 {
 	add_to_alternates_memory(t->path.buf);
 }
+
+void tmp_objdir_replace_main_odb(struct tmp_objdir *t)
+{
+	if (t->prev_main_odb)
+		BUG("the main object database is already replaced");
+	t->prev_main_odb = set_temporary_main_odb(t->path.buf);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
index b1e45b4c75d..4b898add05b 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -51,4 +51,10 @@ int tmp_objdir_destroy(struct tmp_objdir *);
  */
 void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
 
+/*
+ * Replaces the main object store in the current process with the temporary
+ * object directory and makes the former main object store an alternate.
+ */
+void tmp_objdir_replace_main_odb(struct tmp_objdir *);
+
 #endif /* TMP_OBJDIR_H */
diff --git a/wrapper.c b/wrapper.c
index 7c6586af321..bb4f9f043ce 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	switch (action) {
+	case FSYNC_WRITEOUT_ONLY:
+
+#ifdef __APPLE__
+		/*
+		 * on macOS, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+
+	case FSYNC_HARDWARE_FLUSH:
+
+#ifdef __APPLE__
+		return fcntl(fd, F_FULLFSYNC);
+#else
+		return fsync(fd);
+#endif
+
+	default:
+		BUG("unexpected git_fsync(%d) call", action);
+	}
+
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index d33e68f6abb..8f53953d4ab 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-09-24 20:12         ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 21:49           ` Neeraj Singh
  2021-09-24 20:12         ` [PATCH v5 5/7] unpack-objects: " Neeraj Singh via GitGitGadget
                           ` (4 subsequent siblings)
  8 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality. This mode
is enabled when passing paths to update-index via the --stdin flag,
as is done by 'git stash'.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to synchronize with the update-index
process after passing it a file path.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..dc7368bb1ee 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1088,6 +1089,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 	the_index.updated_skipworktree = 1;
 
+	/* we might be adding many objects to the object database */
+	plug_bulk_checkin();
+
 	/*
 	 * Custom copy of parse_options() because we want to handle
 	 * filename arguments as they come.
@@ -1168,6 +1172,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	/* by now we must have added all of the new objects */
+	unplug_bulk_checkin();
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 5/7] unpack-objects: use the bulk-checkin infrastructure
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                           ` (3 preceding siblings ...)
  2021-09-24 20:12         ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
                           ` (3 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/unpack-objects.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 4a9466295ba..51eb4f7b531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "object-store.h"
 #include "object.h"
@@ -503,10 +504,12 @@ static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
+	plug_bulk_checkin();
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
 	}
+	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                           ` (4 preceding siblings ...)
  2021-09-24 20:12         ` [PATCH v5 5/7] unpack-objects: " Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 20:12         ` [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
                           ` (2 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add test cases to exercise batch mode for:
 * 'git add'
 * 'git stash'
 * 'git update-index'
 * 'git unpack-objects'

These tests ensure that the added data winds up in the object database.

In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added due
to it already being in the repo.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/lib-unique-files.sh  | 36 ++++++++++++++++++++++++++++++++++++
 t/t3700-add.sh         | 20 ++++++++++++++++++++
 t/t3903-stash.sh       | 14 ++++++++++++++
 t/t5300-pack-object.sh | 30 +++++++++++++++++++-----------
 4 files changed, 89 insertions(+), 11 deletions(-)
 create mode 100644 t/lib-unique-files.sh

diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
new file mode 100644
index 00000000000..a7de4ca8512
--- /dev/null
+++ b/t/lib-unique-files.sh
@@ -0,0 +1,36 @@
+# Helper to create files with unique contents
+
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
+#					 each in my_dir, all with unique
+#					 contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+	local counter=0
+	test_tick
+	local basedata=$test_tick
+
+
+	rm -rf $basedir
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir"
+		for j in $(test_seq $files)
+		do
+			counter=$((counter + 1))
+			echo "$basedata.$counter"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4086e1ebbc9..36049a53ff7 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.'
 
 . ./test-lib.sh
 
+. $TEST_DIRECTORY/lib-unique-files.sh
+
 # Test the file mode "$1" of the file "$2" in the index.
 test_mode_in_index () {
 	case "$(git ls-files -s "$2")" in
@@ -33,6 +35,24 @@ test_expect_success \
     'Test that "git add -- -q" works' \
     'touch -- -q && git add -- -q'
 
+test_expect_success 'git add: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+	git ls-files --stage fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files2 &&
+	find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
+	rm -f fsynced_files2 &&
+	git ls-files --stage fsync-files2/ > fsynced_files2 &&
+	test_line_count = 8 fsynced_files2 &&
+	awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
+"
+
 test_expect_success \
 	'git add: Test that executable bit is not used if core.filemode=0' \
 	'git config core.filemode 0 &&
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 873aa56e359..2fc819e5584 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. $TEST_DIRECTORY/lib-unique-files.sh
 
 diff_cmp () {
 	for i in "$1" "$2"
@@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' '
 	git rev-parse --verify refs/stash:A.t
 '
 
+test_expect_success 'stash with core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+
+	# The files were untracked, so use the third parent,
+	# which contains the untracked files
+	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+
 test_expect_success 'stash -c stash.useBuiltin=false warning ' '
 	expected="stash.useBuiltin support has been removed" &&
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index e13a8842075..38663dc1393 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -162,23 +162,23 @@ test_expect_success 'pack-objects with bogus arguments' '
 
 check_unpack () {
 	test_when_finished "rm -rf git2" &&
-	git init --bare git2 &&
-	git -C git2 unpack-objects -n <"$1".pack &&
-	git -C git2 unpack-objects <"$1".pack &&
-	(cd .git && find objects -type f -print) |
-	while read path
-	do
-		cmp git2/$path .git/$path || {
-			echo $path differs.
-			return 1
-		}
-	done
+	git $2 init --bare git2 &&
+	(
+		git $2 -C git2 unpack-objects -n <"$1".pack &&
+		git $2 -C git2 unpack-objects <"$1".pack &&
+		git $2 -C git2 cat-file --batch-check="%(objectname)"
+	) <obj-list >current &&
+	cmp obj-list current
 }
 
 test_expect_success 'unpack without delta' '
 	check_unpack test-1-${packname_1}
 '
 
+test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
+	check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with REF_DELTA' '
 	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
 	check_deltas stderr -gt 0
@@ -188,6 +188,10 @@ test_expect_success 'unpack with REF_DELTA' '
 	check_unpack test-2-${packname_2}
 '
 
+test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with OFS_DELTA' '
 	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
 			<obj-list 2>stderr) &&
@@ -198,6 +202,10 @@ test_expect_success 'unpack with OFS_DELTA' '
 	check_unpack test-3-${packname_3}
 '
 
+test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'compare delta flavors' '
 	perl -e '\''
 		defined($_ = -s $_) or die for @ARGV;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                           ` (5 preceding siblings ...)
  2021-09-24 20:12         ` [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-24 20:12         ` Neeraj Singh via GitGitGadget
  2021-09-24 23:31         ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 20:12 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/p3700-add.sh   | 43 ++++++++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..e93c08a2e70
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..c9fcd0c03eb
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes
  2021-09-24 20:12         ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-24 21:47           ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-24 21:47 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: Git List, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Fri, Sep 24, 2021 at 1:12 PM Neeraj Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Neeraj Singh <neerajsi@microsoft.com>
>
> diff --git a/builtin/add.c b/builtin/add.c
> index 2244311d485..9d9897cf037 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -678,6 +678,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
>
>         if (chmod_arg && pathspec.nr)
>                 exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
> +
>         unplug_bulk_checkin();
>
>  finish:

I'll remove this stray change on re-roll.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure
  2021-09-24 20:12         ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-24 21:49           ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-24 21:49 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: Git List, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Fri, Sep 24, 2021 at 1:12 PM Neeraj Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Neeraj Singh <neerajsi@microsoft.com>
>
> The update-index functionality is used internally by 'git stash push' to
> setup the internal stashed commit.
>
> This change enables bulk-checkin for update-index infrastructure to
> speed up adding new objects to the object database by leveraging the
> pack functionality and the new bulk-fsync functionality. This mode
> is enabled when passing paths to update-index via the --stdin flag,
> as is done by 'git stash'.

This part of the description is now inaccurate. All modes of update-index are
now enlightened to use bulk_checkin. I'll just remove the sentence that
scopes the change to --stdin on reroll.

>
> There is some risk with this change, since under batch fsync, the object
> files will not be available until the update-index is entirely complete.
> This usage is unlikely, since any tool invoking update-index and
> expecting to see objects would have to synchronize with the update-index
> process after passing it a file path.
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  builtin/update-index.c | 6 ++++++
>  1 file changed, 6 insertions(+)

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                           ` (6 preceding siblings ...)
  2021-09-24 20:12         ` [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-09-24 23:31         ` Neeraj Singh
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-24 23:31 UTC (permalink / raw)
  To: Neeraj K. Singh via GitGitGadget
  Cc: Git List, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Fri, Sep 24, 2021 at 1:12 PM Neeraj K. Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Thanks to everyone for review so far! Changes since v4, all in response to
> review feedback from Ævar Arnfjörð Bjarmason:
>
>  * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
>    to add a statement about not fsyncing parent directories.
>
>    * I still don't want to make any promises on behalf of the Linux FS developers
>      in the documentation. However, according to [v4.1] and my understanding
>      of how XFS journals are documented to work, it looks like recent versions
>      of Linux running on XFS should be as safe as Windows or macOS in 'batch'
>      mode. I don't know about ext4, since it's not clear to me when metadata
>      updates are made visible to the journal.
>
>
>  * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
>    pointed out, this lets us access the added loose objects immediately,
>    rather than only after unplugging the bulk checkin. This is a hard
>    requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
>
>    * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
>      tmp objdir (as determined by the quarantine environment variable).
>
>    * I added support to the tmp-objdir lib to replace the 'main' writable odb.
>
>    * Instead of using a lockfile for the final full fsync, we now use a new dummy
>      temp file. Doing that makes the below unpack-objects change easier.
>
>
>  * Add bulk-checkin support to unpack-objects, which is used in fetch and
>    push. In addition to making those operations faster, it allows us to
>    directly compare performance of packfiles against loose objects. Please
>    see [v4.2] for a measurement of 'git push' to a local upstream with
>    different numbers of unique new files.
>
>  * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.
>
>  * Remove comment with link to NtFlushBuffersFileEx documentation.
>
>  * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
>    contents, but now this uses test_tick, so it should be deterministic from
>    run to run.
>
>  * Ensure there are tests for all of the modified commands. Make the
>    unpack-objects tests validate that the unpacked objects are really
>    available in the ODB.
>
> References for v4: [v4.1]
> https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/#t
>
> [v4.2]
> https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117
>
> Changes since v3:
>
>  * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
>    accept no value to mean "true" and we require 'batch' to be lowercase.
>
>  * Leave the default fsync mode as 'false'. Git for windows can change its
>    default when this series makes it over to that fork.
>
>  * Use a switch statement in git_fsync, as suggested by Junio.
>
>  * Add regression test cases for core.fsyncobjectfiles=batch. This should
>    keep the batch functionality basically working in upstream git even if
>    few users adopt batch mode initially. I expect git-for-windows will
>    provide a good baking area for the new mode.
>
> Neeraj Singh (7):
>   object-file.c: do not rename in a temp odb
>   bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>   core.fsyncobjectfiles: batched disk flushes
>   update-index: use the bulk-checkin infrastructure
>   unpack-objects: use the bulk-checkin infrastructure
>   core.fsyncobjectfiles: tests for batch mode
>   core.fsyncobjectfiles: performance tests for add and stash
>
>  Documentation/config/core.txt |  29 +++++++--
>  Makefile                      |   6 ++
>  builtin/add.c                 |   1 +
>  builtin/unpack-objects.c      |   3 +
>  builtin/update-index.c        |   6 ++
>  bulk-checkin.c                |  92 +++++++++++++++++++++++---
>  bulk-checkin.h                |   2 +
>  cache.h                       |   8 ++-
>  config.c                      |   7 +-
>  config.mak.uname              |   1 +
>  configure.ac                  |   8 +++
>  environment.c                 |   6 +-
>  git-compat-util.h             |   7 ++
>  object-file.c                 | 118 +++++++++++++++++++++++++++++-----
>  object-store.h                |  22 +++++++
>  object.c                      |   2 +-
>  repository.c                  |   2 +
>  repository.h                  |   1 +
>  t/lib-unique-files.sh         |  36 +++++++++++
>  t/perf/p3700-add.sh           |  43 +++++++++++++
>  t/perf/p3900-stash.sh         |  46 +++++++++++++
>  t/t3700-add.sh                |  20 ++++++
>  t/t3903-stash.sh              |  14 ++++
>  t/t5300-pack-object.sh        |  30 +++++----
>  tmp-objdir.c                  |  20 +++++-
>  tmp-objdir.h                  |   6 ++
>  wrapper.c                     |  44 +++++++++++++
>  write-or-die.c                |   2 +-
>  28 files changed, 532 insertions(+), 50 deletions(-)
>  create mode 100644 t/lib-unique-files.sh
>  create mode 100755 t/perf/p3700-add.sh
>  create mode 100755 t/perf/p3900-stash.sh
>
>
> base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v5
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v5
> Pull-Request: https://github.com/git/git/pull/1076
>
> Range-diff vs v4:
>
>  -:  ----------- > 1:  95315f35a28 object-file.c: do not rename in a temp odb
>  1:  d5893e28df1 = 2:  df6fab94d67 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
>  2:  12cad737635 ! 3:  fe19cdfc930 core.fsyncobjectfiles: batched disk flushes
>      @@ Commit message
>
>           One major source of the cost of fsync is the implied flush of the
>           hardware writeback cache within the disk drive. Fortunately, Windows,
>      -    macOS, and Linux each offer mechanisms to write data from the filesystem
>      -    page cache without initiating a hardware flush.
>      +    and macOS offer mechanisms to write data from the filesystem page cache
>      +    without initiating a hardware flush. Linux has the sync_file_range API,
>      +    which issues a pagecache writeback request reliably after version 5.2.
>
>           This patch introduces a new 'core.fsyncObjectFiles = batch' option that
>      -    takes advantage of the bulk-checkin infrastructure to batch up hardware
>      -    flushes.
>      +    batches up hardware flushes. It hooks into the bulk-checkin plugging and
>      +    unplugging functionality and takes advantage of tmp-objdir.
>
>      -    When the new mode is enabled we do the following for new objects:
>      -
>      -    1. Create a tmp_obj_XXXX file and write the object data to it.
>      +    When the new mode is enabled we do the following for each new object:
>      +    1. Create the object in a tmp-objdir.
>           2. Issue a pagecache writeback request and wait for it to complete.
>      -    3. Record the tmp name and the final name in the bulk-checkin state for
>      -       later rename.
>
>      -    At the end of the entire transaction we:
>      -    1. Issue a fsync against the lock file to flush the hardware writeback
>      -       cache, which should by now have processed the tmp file writes.
>      -    2. Rename all of the temp files to their final names.
>      +    At the end of the entire transaction when unplugging bulk checkin we:
>      +    1. Issue an fsync against a dummy file to flush the hardware writeback
>      +       cache, which should by now have processed the tmp-objdir writes.
>      +    2. Rename all of the tmp-objdir files to their final names.
>           3. When updating the index and/or refs, we assume that Git will issue
>      -       another fsync internal to that operation.
>      +       another fsync internal to that operation. This is not the case today,
>      +       but may be a good extension to those components.
>
>           On a filesystem with a singular journal that is updated during name
>      -    operations (e.g. create, link, rename, etc), such as NTFS and HFS+, we
>      +    operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we
>           would expect the fsync to trigger a journal writeout so that this
>           sequence is enough to ensure that the user's data is durable by the time
>           the git command returns.
>      @@ Documentation/config/core.txt: core.whitespace::
>       + A value indicating the level of effort Git will expend in
>       + trying to make objects added to the repo durable in the event
>       + of an unclean system shutdown. This setting currently only
>      -+ controls the object store, so updates to any refs or the
>      -+ index may not be equally durable.
>      ++ controls loose objects in the object store, so updates to any
>      ++ refs or the index may not be equally durable.
>       ++
>       +* `false` allows data to remain in file system caches according to
>       +  operating system policy, whence it may be lost if the system loses power
>       +  or crashes.
>      -+* `true` triggers a data integrity flush for each object added to the
>      ++* `true` triggers a data integrity flush for each loose object added to the
>       +  object store. This is the safest setting that is likely to ensure durability
>       +  across all operating systems and file systems that honor the 'fsync' system
>       +  call. However, this setting comes with a significant performance cost on
>      -+  common hardware.
>      ++  common hardware. Git does not currently fsync parent directories for
>      ++  newly-added files, so some filesystems may still allow data to be lost on
>      ++  system crash.
>       +* `batch` enables an experimental mode that uses interfaces available in some
>      -+  operating systems to write object data with a minimal set of FLUSH CACHE
>      -+  (or equivalent) commands sent to the storage controller. If the operating
>      -+  system interfaces are not available, this mode behaves the same as `true`.
>      -+  This mode is expected to be safe on macOS for repos stored on HFS+ or APFS
>      -+  filesystems and on Windows for repos stored on NTFS or ReFS.
>      ++  operating systems to write loose object data with a minimal set of FLUSH
>      ++  CACHE (or equivalent) commands sent to the storage controller. If the
>      ++  operating system interfaces are not available, this mode behaves the same as
>      ++  `true`. This mode is expected to be as safe as `true` on macOS for repos
>      ++  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
>      ++  ReFS.
>
>        core.preloadIndex::
>         Enable parallel index preload for operations like 'git diff'
>      @@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
>
>         if (chmod_arg && pathspec.nr)
>                 exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
>      -- unplug_bulk_checkin();
>       +
>      -+ unplug_bulk_checkin(&lock_file);
>      +  unplug_bulk_checkin();
>
>        finish:
>      -  if (write_locked_index(&the_index, &lock_file,
>
>        ## bulk-checkin.c ##
>       @@
>      @@ bulk-checkin.c
>        #include "pack.h"
>        #include "strbuf.h"
>       +#include "string-list.h"
>      ++#include "tmp-objdir.h"
>        #include "packfile.h"
>        #include "object-store.h"
>
>        static int bulk_checkin_plugged;
>      -
>      -+static struct string_list bulk_fsync_state = STRING_LIST_INIT_DUP;
>      ++static int needs_batch_fsync;
>       +
>      ++static struct tmp_objdir *bulk_fsync_objdir;
>      +
>        static struct bulk_checkin_state {
>         char *pack_tmp_name;
>      -  struct hashfile *f;
>       @@ bulk-checkin.c: clear_exit:
>         reprepare_packed_git(the_repository);
>        }
>
>      -+static void do_sync_and_rename(struct string_list *fsync_state, struct lock_file *lock_file)
>      ++/*
>      ++ * Cleanup after batch-mode fsync_object_files.
>      ++ */
>      ++static void do_batch_fsync(void)
>       +{
>      -+ if (fsync_state->nr) {
>      -+         struct string_list_item *rename;
>      -+
>      -+         /*
>      -+          * Issue a full hardware flush against the lock file to ensure
>      -+          * that all objects are durable before any renames occur.
>      -+          * The code in fsync_and_close_loose_object_bulk_checkin has
>      -+          * already ensured that writeout has occurred, but it has not
>      -+          * flushed any writeback cache in the storage hardware.
>      -+          */
>      -+         fsync_or_die(get_lock_file_fd(lock_file), get_lock_file_path(lock_file));
>      -+
>      -+         for_each_string_list_item(rename, fsync_state) {
>      -+                 const char *src = rename->string;
>      -+                 const char *dst = rename->util;
>      -+
>      -+                 if (finalize_object_file(src, dst))
>      -+                         die_errno(_("could not rename '%s' to '%s'"), src, dst);
>      -+         }
>      -+
>      -+         string_list_clear(fsync_state, 1);
>      ++ /*
>      ++  * Issue a full hardware flush against a temporary file to ensure
>      ++  * that all objects are durable before any renames occur.  The code in
>      ++  * fsync_loose_object_bulk_checkin has already issued a writeout
>      ++  * request, but it has not flushed any writeback cache in the storage
>      ++  * hardware.
>      ++  */
>      ++
>      ++ if (needs_batch_fsync) {
>      ++         struct strbuf temp_path = STRBUF_INIT;
>      ++         struct tempfile *temp;
>      ++
>      ++         strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
>      ++         temp = xmks_tempfile(temp_path.buf);
>      ++         fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
>      ++         delete_tempfile(&temp);
>      ++         strbuf_release(&temp_path);
>       + }
>      ++
>      ++ if (bulk_fsync_objdir)
>      ++         tmp_objdir_migrate(bulk_fsync_objdir);
>       +}
>       +
>        static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
>      @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
>         return 0;
>        }
>
>      -+static void add_rename_bulk_checkin(struct string_list *fsync_state,
>      -+                             const char *src, const char *dst)
>      ++void fsync_loose_object_bulk_checkin(int fd)
>       +{
>      -+ string_list_insert(fsync_state, src)->util = xstrdup(dst);
>      -+}
>      -+
>      -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
>      -+                                       const char *filename, time_t mtime)
>      -+{
>      -+ int do_finalize = 1;
>      -+ int ret = 0;
>      -+
>      -+ if (fsync_object_files != FSYNC_OBJECT_FILES_OFF) {
>      -+         /*
>      -+          * If we have a plugged bulk checkin, we issue a call that
>      -+          * cleans the filesystem page cache but avoids a hardware flush
>      -+          * command. Later on we will issue a single hardware flush
>      -+          * before renaming files as part of do_sync_and_rename.
>      -+          */
>      -+         if (bulk_checkin_plugged &&
>      -+             fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
>      -+             git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
>      -+                 add_rename_bulk_checkin(&bulk_fsync_state, tmpfile, filename);
>      -+                 do_finalize = 0;
>      -+
>      -+         } else {
>      -+                 fsync_or_die(fd, "loose object file");
>      -+         }
>      -+ }
>      -+
>      -+ if (close(fd))
>      -+         die_errno(_("error when closing loose object file"));
>      -+
>      -+ if (mtime) {
>      -+         struct utimbuf utb;
>      -+         utb.actime = mtime;
>      -+         utb.modtime = mtime;
>      -+         if (utime(tmpfile, &utb) < 0)
>      -+                 warning_errno(_("failed utime() on %s"), tmpfile);
>      ++ assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
>      ++
>      ++ /*
>      ++  * If we have a plugged bulk checkin, we issue a call that
>      ++  * cleans the filesystem page cache but avoids a hardware flush
>      ++  * command. Later on we will issue a single hardware flush
>      ++  * before as part of do_batch_fsync.
>      ++  */
>      ++ if (bulk_checkin_plugged &&
>      ++     git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
>      ++         assert(the_repository->objects->odb->is_temp);
>      ++         if (!needs_batch_fsync)
>      ++                 needs_batch_fsync = 1;
>      ++ } else {
>      ++         fsync_or_die(fd, "loose object file");
>       + }
>      -+
>      -+ if (do_finalize)
>      -+         ret = finalize_object_file(tmpfile, filename);
>      -+
>      -+ return ret;
>       +}
>       +
>        int index_bulk_checkin(struct object_id *oid,
>                        int fd, size_t size, enum object_type type,
>                        const char *path, unsigned flags)
>      -@@ bulk-checkin.c: void plug_bulk_checkin(void)
>      +@@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
>      + void plug_bulk_checkin(void)
>      + {
>      +  assert(!bulk_checkin_plugged);
>      ++
>      ++ /*
>      ++  * Create a temporary object directory if the current
>      ++  * object directory is not already temporary.
>      ++  */
>      ++ if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
>      ++     !the_repository->objects->odb->is_temp) {
>      ++         bulk_fsync_objdir = tmp_objdir_create();
>      ++         if (!bulk_fsync_objdir)
>      ++                 die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
>      ++
>      ++         tmp_objdir_replace_main_odb(bulk_fsync_objdir);
>      ++ }
>      ++
>         bulk_checkin_plugged = 1;
>        }
>
>      --void unplug_bulk_checkin(void)
>      -+void unplug_bulk_checkin(struct lock_file *lock_file)
>      - {
>      -  assert(bulk_checkin_plugged);
>      +@@ bulk-checkin.c: void unplug_bulk_checkin(void)
>         bulk_checkin_plugged = 0;
>         if (bulk_checkin_state.f)
>                 finish_bulk_checkin(&bulk_checkin_state);
>       +
>      -+ do_sync_and_rename(&bulk_fsync_state, lock_file);
>      ++ do_batch_fsync();
>        }
>
>        ## bulk-checkin.h ##
>      @@ bulk-checkin.h
>
>        #include "cache.h"
>
>      -+int fsync_and_close_loose_object_bulk_checkin(int fd, const char *tmpfile,
>      -+                                       const char *filename, time_t mtime);
>      ++void fsync_loose_object_bulk_checkin(int fd);
>       +
>        int index_bulk_checkin(struct object_id *oid,
>                        int fd, size_t size, enum object_type type,
>                        const char *path, unsigned flags);
>      -
>      - void plug_bulk_checkin(void);
>      --void unplug_bulk_checkin(void);
>      -+void unplug_bulk_checkin(struct lock_file *);
>      -
>      - #endif
>
>        ## cache.h ##
>       @@ cache.h: void reset_shared_repository(void);
>      @@ cache.h: void reset_shared_repository(void);
>        extern char *git_replace_ref_base;
>
>       -extern int fsync_object_files;
>      -+enum FSYNC_OBJECT_FILES_MODE {
>      ++enum fsync_object_files_mode {
>       +    FSYNC_OBJECT_FILES_OFF,
>       +    FSYNC_OBJECT_FILES_ON,
>       +    FSYNC_OBJECT_FILES_BATCH
>       +};
>       +
>      -+extern enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
>      ++extern enum fsync_object_files_mode fsync_object_files;
>        extern int core_preload_index;
>        extern int precomposed_unicode;
>        extern int protect_hfs;
>      @@ environment.c: const char *git_hooks_path;
>        int core_compression_level;
>        int pack_compression_level = Z_DEFAULT_COMPRESSION;
>       -int fsync_object_files;
>      -+enum FSYNC_OBJECT_FILES_MODE fsync_object_files;
>      ++enum fsync_object_files_mode fsync_object_files;
>        size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>        size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
>        size_t delta_base_cache_limit = 96 * 1024 * 1024;
>      @@ git-compat-util.h: __attribute__((format (printf, 1, 2))) NORETURN
>         * Returns 0 on success, which includes trying to unlink an object that does
>
>        ## object-file.c ##
>      -@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>      -  return 0;
>      +@@ object-file.c: void add_to_alternates_memory(const char *reference)
>      +                       '\n', NULL, 0);
>        }
>
>      --/* Finalize a file on disk, and close it. */
>      --static void close_loose_object(int fd)
>      --{
>      ++struct object_directory *set_temporary_main_odb(const char *dir)
>      ++{
>      ++ struct object_directory *main_odb, *new_odb, *old_next;
>      ++
>      ++ /*
>      ++  * Make sure alternates are initialized, or else our entry may be
>      ++  * overwritten when they are.
>      ++  */
>      ++ prepare_alt_odb(the_repository);
>      ++
>      ++ /* Copy the existing object directory and make it an alternate. */
>      ++ main_odb = the_repository->objects->odb;
>      ++ new_odb = xmalloc(sizeof(*new_odb));
>      ++ *new_odb = *main_odb;
>      ++ *the_repository->objects->odb_tail = new_odb;
>      ++ the_repository->objects->odb_tail = &(new_odb->next);
>      ++ new_odb->next = NULL;
>      ++
>      ++ /*
>      ++  * Reinitialize the main odb with the specified path, being careful
>      ++  * to keep the next pointer value.
>      ++  */
>      ++ old_next = main_odb->next;
>      ++ memset(main_odb, 0, sizeof(*main_odb));
>      ++ main_odb->next = old_next;
>      ++ main_odb->is_temp = 1;
>      ++ main_odb->path = xstrdup(dir);
>      ++ return new_odb;
>      ++}
>      ++
>      ++void restore_main_odb(struct object_directory *odb)
>      ++{
>      ++ struct object_directory **prev, *main_odb;
>      ++
>      ++ /* Unlink the saved previous main ODB from the list. */
>      ++ prev = &the_repository->objects->odb->next;
>      ++ assert(*prev);
>      ++ while (*prev != odb) {
>      ++         prev = &(*prev)->next;
>      ++ }
>      ++ *prev = odb->next;
>      ++ if (*prev == NULL)
>      ++         the_repository->objects->odb_tail = prev;
>      ++
>      ++ /*
>      ++  * Restore the data from the old main odb, being careful to
>      ++  * keep the next pointer value
>      ++  */
>      ++ main_odb = the_repository->objects->odb;
>      ++ SWAP(*main_odb, *odb);
>      ++ main_odb->next = odb->next;
>      ++ free_object_directory(odb);
>      ++}
>      ++
>      + /*
>      +  * Compute the exact path an alternate is at and returns it. In case of
>      +  * error NULL is returned and the human readable error is added to `err`
>      +@@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>      + /* Finalize a file on disk, and close it. */
>      + static void close_loose_object(int fd)
>      + {
>       - if (fsync_object_files)
>      --         fsync_or_die(fd, "loose object file");
>      -- if (close(fd) != 0)
>      --         die_errno(_("error when closing loose object file"));
>      --}
>      --
>      - /* Size of directory component, including the ending '/' */
>      - static inline int directory_size(const char *filename)
>      ++ switch (fsync_object_files) {
>      ++ case FSYNC_OBJECT_FILES_OFF:
>      ++         break;
>      ++ case FSYNC_OBJECT_FILES_ON:
>      +          fsync_or_die(fd, "loose object file");
>      ++         break;
>      ++ case FSYNC_OBJECT_FILES_BATCH:
>      ++         fsync_loose_object_bulk_checkin(fd);
>      ++         break;
>      ++ default:
>      ++         BUG("Invalid fsync_object_files mode.");
>      ++ }
>      ++
>      +  if (close(fd) != 0)
>      +          die_errno(_("error when closing loose object file"));
>      + }
>      +
>      + ## object-store.h ##
>      +@@ object-store.h: void add_to_alternates_file(const char *dir);
>      +  */
>      + void add_to_alternates_memory(const char *dir);
>      +
>      ++/*
>      ++ * Replace the current main object directory with the specified temporary
>      ++ * object directory. We make a copy of the former main object directory,
>      ++ * add it as an in-memory alternate, and return the copy so that it can
>      ++ * be restored via restore_main_odb.
>      ++ */
>      ++struct object_directory *set_temporary_main_odb(const char *dir);
>      ++
>      ++/*
>      ++ * Restore a previous ODB replaced by set_temporary_main_odb.
>      ++ */
>      ++void restore_main_odb(struct object_directory *odb);
>      ++
>      + /*
>      +  * Populate and return the loose object cache array corresponding to the
>      +  * given object ID.
>      +@@ object-store.h: struct oidtree *odb_loose_cache(struct object_directory *odb,
>      + /* Empty the loose object cache for the specified object directory. */
>      + void odb_clear_loose_cache(struct object_directory *odb);
>      +
>      ++/* Clear and free the specified object directory */
>      ++void free_object_directory(struct object_directory *odb);
>      ++
>      + struct packed_git {
>      +  struct hashmap_entry packmap_ent;
>      +  struct packed_git *next;
>      +
>      + ## object.c ##
>      +@@ object.c: struct raw_object_store *raw_object_store_new(void)
>      +  return o;
>      + }
>      +
>      +-static void free_object_directory(struct object_directory *odb)
>      ++void free_object_directory(struct object_directory *odb)
>        {
>      -@@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
>      -          die(_("confused by unstable object source data for %s"),
>      -              oid_to_hex(oid));
>      +  free(odb->path);
>      +  odb_clear_loose_cache(odb);
>      +
>      + ## tmp-objdir.c ##
>      +@@
>      + struct tmp_objdir {
>      +  struct strbuf path;
>      +  struct strvec env;
>      ++ struct object_directory *prev_main_odb;
>      + };
>
>      -- close_loose_object(fd);
>      --
>      -- if (mtime) {
>      --         struct utimbuf utb;
>      --         utb.actime = mtime;
>      --         utb.modtime = mtime;
>      --         if (utime(tmp_file.buf, &utb) < 0)
>      --                 warning_errno(_("failed utime() on %s"), tmp_file.buf);
>      -- }
>      --
>      -- return finalize_object_file(tmp_file.buf, filename.buf);
>      -+ return fsync_and_close_loose_object_bulk_checkin(fd, tmp_file.buf,
>      -+                                                  filename.buf, mtime);
>      + /*
>      +@@ tmp-objdir.c: static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
>      +   * freeing memory; it may cause a deadlock if the signal
>      +   * arrived while libc's allocator lock is held.
>      +   */
>      +- if (!on_signal)
>      ++ if (!on_signal) {
>      ++         if (t->prev_main_odb)
>      ++                 restore_main_odb(t->prev_main_odb);
>      +          tmp_objdir_free(t);
>      ++ }
>      ++
>      +  return err;
>        }
>
>      - static int freshen_loose_object(const struct object_id *oid)
>      +@@ tmp-objdir.c: struct tmp_objdir *tmp_objdir_create(void)
>      +  t = xmalloc(sizeof(*t));
>      +  strbuf_init(&t->path, 0);
>      +  strvec_init(&t->env);
>      ++ t->prev_main_odb = NULL;
>      +
>      +  strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
>      +
>      +@@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
>      +  if (!t)
>      +          return 0;
>      +
>      ++ if (t->prev_main_odb) {
>      ++         restore_main_odb(t->prev_main_odb);
>      ++         t->prev_main_odb = NULL;
>      ++ }
>      ++
>      +  strbuf_addbuf(&src, &t->path);
>      +  strbuf_addstr(&dst, get_object_directory());
>      +
>      +@@ tmp-objdir.c: void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
>      + {
>      +  add_to_alternates_memory(t->path.buf);
>      + }
>      ++
>      ++void tmp_objdir_replace_main_odb(struct tmp_objdir *t)
>      ++{
>      ++ if (t->prev_main_odb)
>      ++         BUG("the main object database is already replaced");
>      ++ t->prev_main_odb = set_temporary_main_odb(t->path.buf);
>      ++}
>      +
>      + ## tmp-objdir.h ##
>      +@@ tmp-objdir.h: int tmp_objdir_destroy(struct tmp_objdir *);
>      +  */
>      + void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
>      +
>      ++/*
>      ++ * Replaces the main object store in the current process with the temporary
>      ++ * object directory and makes the former main object store an alternate.
>      ++ */
>      ++void tmp_objdir_replace_main_odb(struct tmp_objdir *);
>      ++
>      + #endif /* TMP_OBJDIR_H */
>
>        ## wrapper.c ##
>       @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
>  3:  a5b3e21b762 < -:  ----------- core.fsyncobjectfiles: add windows support for batch mode
>  4:  f7f756f3932 ! 4:  485b4a767df update-index: use the bulk-checkin infrastructure
>      @@ Commit message
>           There is some risk with this change, since under batch fsync, the object
>           files will not be available until the update-index is entirely complete.
>           This usage is unlikely, since any tool invoking update-index and
>      -    expecting to see objects would have to snoop the output of --verbose to
>      -    find out when update-index has actually processed a given path.
>      -    Additionally the index is locked for the duration of the update.
>      +    expecting to see objects would have to synchronize with the update-index
>      +    process after passing it a file path.
>
>           Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
>
>      @@ builtin/update-index.c
>        #include "lockfile.h"
>        #include "quote.h"
>       @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
>      -          struct strbuf unquoted = STRBUF_INIT;
>
>      -          setup_work_tree();
>      -+         plug_bulk_checkin();
>      -          while (getline_fn(&buf, stdin) != EOF) {
>      -                  char *p;
>      -                  if (!nul_term_line && buf.buf[0] == '"') {
>      +  the_index.updated_skipworktree = 1;
>      +
>      ++ /* we might be adding many objects to the object database */
>      ++ plug_bulk_checkin();
>      ++
>      +  /*
>      +   * Custom copy of parse_options() because we want to handle
>      +   * filename arguments as they come.
>       @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
>      -                          chmod_path(set_executable_bit, p);
>      -                  free(p);
>      -          }
>      -+         unplug_bulk_checkin(&lock_file);
>      -          strbuf_release(&unquoted);
>                 strbuf_release(&buf);
>         }
>      +
>      ++ /* by now we must have added all of the new objects */
>      ++ unplug_bulk_checkin();
>      +  if (split_index > 0) {
>      +          if (git_config_get_split_index() == 0)
>      +                  warning(_("core.splitIndex is set to false; "
>  -:  ----------- > 5:  889e7668760 unpack-objects: use the bulk-checkin infrastructure
>  5:  afb0028e796 ! 6:  0f2e3b25759 core.fsyncobjectfiles: tests for batch mode
>      @@ Metadata
>        ## Commit message ##
>           core.fsyncobjectfiles: tests for batch mode
>
>      -    Add test cases to exercise batch mode for 'git add'
>      -    and 'git stash'. These tests ensure that the added
>      -    data winds up in the object database.
>      +    Add test cases to exercise batch mode for:
>      +     * 'git add'
>      +     * 'git stash'
>      +     * 'git update-index'
>      +     * 'git unpack-objects'
>
>      -    I verified the tests by introducing an incorrect rename
>      -    in do_sync_and_rename.
>      +    These tests ensure that the added data winds up in the object database.
>      +
>      +    In this change we introduce a new test helper lib-unique-files.sh. The
>      +    goal of this library is to create a tree of files that have different
>      +    oids from any other files that may have been created in the current test
>      +    repo. This helps us avoid missing validation of an object being added due
>      +    to it already being in the repo.
>
>           Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
>
>      @@ t/lib-unique-files.sh (new)
>       @@
>       +# Helper to create files with unique contents
>       +
>      -+test_create_unique_files_base__=$(date -u)
>      -+test_create_unique_files_counter__=0
>       +
>       +# Create multiple files with unique contents. Takes the number of
>       +# directories, the number of files in each directory, and the base
>       +# directory.
>       +#
>      -+# test_create_unique_files 2 3 . -- Creates 2 directories with 3 files
>      -+#                                    each in the specified directory, all
>      -+#                                    with unique contents.
>      ++# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
>      ++#                                         each in my_dir, all with unique
>      ++#                                         contents.
>       +
>       +test_create_unique_files() {
>       + test "$#" -ne 3 && BUG "3 param"
>      @@ t/lib-unique-files.sh (new)
>       + local dirs=$1
>       + local files=$2
>       + local basedir=$3
>      ++ local counter=0
>      ++ test_tick
>      ++ local basedata=$test_tick
>      ++
>       +
>      -+ rm -rf $basedir >/dev/null
>      ++ rm -rf $basedir
>       +
>       + for i in $(test_seq $dirs)
>       + do
>       +         local dir=$basedir/dir$i
>       +
>      -+         mkdir -p "$dir" > /dev/null
>      ++         mkdir -p "$dir"
>       +         for j in $(test_seq $files)
>       +         do
>      -+                 test_create_unique_files_counter__=$((test_create_unique_files_counter__ + 1))
>      -+                 echo "$test_create_unique_files_base__.$test_create_unique_files_counter__"  >"$dir/file$j.txt"
>      ++                 counter=$((counter + 1))
>      ++                 echo "$basedata.$counter"  >"$dir/file$j.txt"
>       +         done
>       + done
>       +}
>      @@ t/t3700-add.sh: test_expect_success \
>       + rm -f fsynced_files &&
>       + git ls-files --stage fsync-files/ > fsynced_files &&
>       + test_line_count = 8 fsynced_files &&
>      -+ cat fsynced_files | awk '{print \$2}' | xargs -n1 git cat-file -e
>      ++ awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
>      ++"
>      ++
>      ++test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
>      ++ test_create_unique_files 2 4 fsync-files2 &&
>      ++ find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
>      ++ rm -f fsynced_files2 &&
>      ++ git ls-files --stage fsync-files2/ > fsynced_files2 &&
>      ++ test_line_count = 8 fsynced_files2 &&
>      ++ awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
>       +"
>       +
>        test_expect_success \
>      @@ t/t3903-stash.sh: test_expect_success 'stash handles skip-worktree entries nicel
>       + # which contains the untracked files
>       + git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
>       + test_line_count = 8 fsynced_files &&
>      -+ cat fsynced_files | awk '{print \$3}' | xargs -n1 git cat-file -e
>      ++ awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
>       +"
>       +
>       +
>        test_expect_success 'stash -c stash.useBuiltin=false warning ' '
>         expected="stash.useBuiltin support has been removed" &&
>
>      +
>      + ## t/t5300-pack-object.sh ##
>      +@@ t/t5300-pack-object.sh: test_expect_success 'pack-objects with bogus arguments' '
>      +
>      + check_unpack () {
>      +  test_when_finished "rm -rf git2" &&
>      +- git init --bare git2 &&
>      +- git -C git2 unpack-objects -n <"$1".pack &&
>      +- git -C git2 unpack-objects <"$1".pack &&
>      +- (cd .git && find objects -type f -print) |
>      +- while read path
>      +- do
>      +-         cmp git2/$path .git/$path || {
>      +-                 echo $path differs.
>      +-                 return 1
>      +-         }
>      +- done
>      ++ git $2 init --bare git2 &&
>      ++ (
>      ++         git $2 -C git2 unpack-objects -n <"$1".pack &&
>      ++         git $2 -C git2 unpack-objects <"$1".pack &&
>      ++         git $2 -C git2 cat-file --batch-check="%(objectname)"
>      ++ ) <obj-list >current &&
>      ++ cmp obj-list current
>      + }
>      +
>      + test_expect_success 'unpack without delta' '
>      +  check_unpack test-1-${packname_1}
>      + '
>      +
>      ++test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
>      ++ check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
>      ++'
>      ++
>      + test_expect_success 'pack with REF_DELTA' '
>      +  packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
>      +  check_deltas stderr -gt 0
>      +@@ t/t5300-pack-object.sh: test_expect_success 'unpack with REF_DELTA' '
>      +  check_unpack test-2-${packname_2}
>      + '
>      +
>      ++test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
>      ++       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
>      ++'
>      ++
>      + test_expect_success 'pack with OFS_DELTA' '
>      +  packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
>      +                  <obj-list 2>stderr) &&
>      +@@ t/t5300-pack-object.sh: test_expect_success 'unpack with OFS_DELTA' '
>      +  check_unpack test-3-${packname_3}
>      + '
>      +
>      ++test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
>      ++       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
>      ++'
>      ++
>      + test_expect_success 'compare delta flavors' '
>      +  perl -e '\''
>      +          defined($_ = -s $_) or die for @ARGV;
>  6:  3e6b80b5fa2 = 7:  6543564376a core.fsyncobjectfiles: performance tests for add and stash
>
> --
> gitgitgadget

Apologies for the spam, I'll be submitting a v6 shortly since there
were several things wrong with
this version.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 0/8] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                           ` (7 preceding siblings ...)
  2021-09-24 23:31         ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
@ 2021-09-24 23:53         ` Neeraj K. Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
                             ` (8 more replies)
  8 siblings, 9 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

Thanks to everyone for review so far!

v5 was a bit of a dud, with some issues that I only noticed after
submitting. v6 changes:

 * re-add Windows support
 * fix minor formatting issues
 * reset git author and commit dates which got messed up

Changes since v4, all in response to review feedback from Ævar Arnfjörð
Bjarmason:

 * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
   to add a statement about not fsyncing parent directories.
   
   * I still don't want to make any promises on behalf of the Linux FS developers
     in the documentation. However, according to [v4.1] and my understanding
     of how XFS journals are documented to work, it looks like recent versions
     of Linux running on XFS should be as safe as Windows or macOS in 'batch'
     mode. I don't know about ext4, since it's not clear to me when metadata
     updates are made visible to the journal.
   

 * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
   pointed out, this lets us access the added loose objects immediately,
   rather than only after unplugging the bulk checkin. This is a hard
   requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
   
   * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
     tmp objdir (as determined by the quarantine environment variable).
   
   * I added support to the tmp-objdir lib to replace the 'main' writable odb.
   
   * Instead of using a lockfile for the final full fsync, we now use a new dummy
     temp file. Doing that makes the below unpack-objects change easier.
   

 * Add bulk-checkin support to unpack-objects, which is used in fetch and
   push. In addition to making those operations faster, it allows us to
   directly compare performance of packfiles against loose objects. Please
   see [v4.2] for a measurement of 'git push' to a local upstream with
   different numbers of unique new files.

 * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.

 * Remove comment with link to NtFlushBuffersFileEx documentation.

 * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
   contents, but now this uses test_tick, so it should be deterministic from
   run to run.

 * Ensure there are tests for all of the modified commands. Make the
   unpack-objects tests validate that the unpacked objects are really
   available in the ODB.

References for v4: [v4.1]
https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/#t

[v4.2]
https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117

Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Neeraj Singh (8):
  object-file.c: do not rename in a temp odb
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       |  29 +++++--
 Makefile                            |   6 ++
 builtin/unpack-objects.c            |   3 +
 builtin/update-index.c              |   6 ++
 bulk-checkin.c                      |  92 +++++++++++++++++++---
 bulk-checkin.h                      |   2 +
 cache.h                             |   8 +-
 compat/mingw.h                      |   3 +
 compat/win32/flush.c                |  28 +++++++
 config.c                            |   7 +-
 config.mak.uname                    |   3 +
 configure.ac                        |   8 ++
 contrib/buildsystems/CMakeLists.txt |   3 +-
 environment.c                       |   6 +-
 git-compat-util.h                   |   7 ++
 object-file.c                       | 118 ++++++++++++++++++++++++----
 object-store.h                      |  22 ++++++
 object.c                            |   2 +-
 repository.c                        |   2 +
 repository.h                        |   1 +
 t/lib-unique-files.sh               |  36 +++++++++
 t/perf/p3700-add.sh                 |  43 ++++++++++
 t/perf/p3900-stash.sh               |  46 +++++++++++
 t/t3700-add.sh                      |  20 +++++
 t/t3903-stash.sh                    |  14 ++++
 t/t5300-pack-object.sh              |  30 ++++---
 tmp-objdir.c                        |  20 ++++-
 tmp-objdir.h                        |   6 ++
 wrapper.c                           |  48 +++++++++++
 write-or-die.c                      |   2 +-
 30 files changed, 570 insertions(+), 51 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: 8b7c11b8668b4e774f81a9f0b4c30144b818f1d1
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v6
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v5:

 1:  95315f35a28 = 1:  e4081f81f6a object-file.c: do not rename in a temp odb
 2:  df6fab94d67 = 2:  ebba65e040c bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 3:  fe19cdfc930 ! 3:  543ea356934 core.fsyncobjectfiles: batched disk flushes
     @@ Makefile: ifdef HAVE_CLOCK_MONOTONIC
       	EXTLIBS += -lrt
       endif
      
     - ## builtin/add.c ##
     -@@ builtin/add.c: int cmd_add(int argc, const char **argv, const char *prefix)
     - 
     - 	if (chmod_arg && pathspec.nr)
     - 		exit_status |= chmod_pathspec(&pathspec, chmod_arg[0], show_only);
     -+
     - 	unplug_bulk_checkin();
     - 
     - finish:
     -
       ## bulk-checkin.c ##
      @@
        */
 -:  ----------- > 4:  bdb99822f8c core.fsyncobjectfiles: add windows support for batch mode
 4:  485b4a767df ! 5:  92e18cedab0 update-index: use the bulk-checkin infrastructure
     @@ Commit message
      
          This change enables bulk-checkin for update-index infrastructure to
          speed up adding new objects to the object database by leveraging the
     -    pack functionality and the new bulk-fsync functionality. This mode
     -    is enabled when passing paths to update-index via the --stdin flag,
     -    as is done by 'git stash'.
     +    pack functionality and the new bulk-fsync functionality.
      
          There is some risk with this change, since under batch fsync, the object
          files will not be available until the update-index is entirely complete.
 5:  889e7668760 = 6:  e3c5a11f225 unpack-objects: use the bulk-checkin infrastructure
 6:  0f2e3b25759 = 7:  385199354fa core.fsyncobjectfiles: tests for batch mode
 7:  6543564376a = 8:  504bcc95c56 core.fsyncobjectfiles: performance tests for add and stash

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 1/8] object-file.c: do not rename in a temp odb
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                             ` (7 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH
being set, create object files with their final names. This avoids
an extra rename beyond what is needed to merge the temporary ODB in
tmp_objdir_migrate.

Creating an object file with the expected final name should be okay
since the git process writing to the temporary object store is the
only writer, and it only invokes write_loose_object/create_object_file
after checking that the object doesn't exist.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 environment.c  |  4 ++++
 object-file.c  | 51 ++++++++++++++++++++++++++++++++++----------------
 object-store.h |  6 ++++++
 repository.c   |  2 ++
 repository.h   |  1 +
 5 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/environment.c b/environment.c
index d6b22ede7ea..d9ba68402e9 100644
--- a/environment.c
+++ b/environment.c
@@ -177,6 +177,10 @@ void setup_git_env(const char *git_dir)
 	args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT);
 	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
 	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
+	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+		args.object_dir_is_temp = 1;
+	}
+
 	repo_set_gitdir(the_repository, git_dir, &args);
 	strvec_clear(&to_free);
 
diff --git a/object-file.c b/object-file.c
index a8be8994814..ab593515cec 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1800,12 +1800,17 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
 }
 
 /*
- * Move the just written object into its final resting place.
+ * Move the just written object into its final resting place,
+ * unless it is already there, as indicated by an empty string for
+ * tmpfile.
  */
 int finalize_object_file(const char *tmpfile, const char *filename)
 {
 	int ret = 0;
 
+	if (!*tmpfile)
+		goto out;
+
 	if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
 		goto try_rename;
 	else if (link(tmpfile, filename))
@@ -1878,21 +1883,37 @@ static inline int directory_size(const char *filename)
 }
 
 /*
- * This creates a temporary file in the same directory as the final
- * 'filename'
+ * This creates a loose object file for the specified object id.
+ * If we're working in a temporary object directory, the file is
+ * created with its final filename, otherwise it is created with
+ * a temporary name and renamed by finalize_object_file.
+ * If no rename is required, an empty string is returned in tmp.
  *
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+static int create_objfile(const struct object_id *oid, struct strbuf *tmp,
+			  struct strbuf *filename)
 {
-	int fd, dirlen = directory_size(filename);
+	int fd, dirlen, is_retrying = 0;
+	const char *object_name;
+	static const int object_mode = 0444;
 
+	loose_object_path(the_repository, filename, oid);
+	dirlen = directory_size(filename->buf);
+
+retry_create:
 	strbuf_reset(tmp);
-	strbuf_add(tmp, filename, dirlen);
-	strbuf_addstr(tmp, "tmp_obj_XXXXXX");
-	fd = git_mkstemp_mode(tmp->buf, 0444);
-	if (fd < 0 && dirlen && errno == ENOENT) {
+	if (!the_repository->objects->odb->is_temp) {
+		strbuf_add(tmp, filename->buf, dirlen);
+		object_name = "tmp_obj_XXXXXX";
+		strbuf_addstr(tmp, object_name);
+		fd = git_mkstemp_mode(tmp->buf, object_mode);
+	} else {
+		fd = open(filename->buf, O_CREAT | O_EXCL | O_RDWR, object_mode);
+	}
+
+	if (fd < 0 && dirlen && errno == ENOENT && !is_retrying) {
 		/*
 		 * Make sure the directory exists; note that the contents
 		 * of the buffer are undefined after mkstemp returns an
@@ -1900,15 +1921,15 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename)
 		 * scratch.
 		 */
 		strbuf_reset(tmp);
-		strbuf_add(tmp, filename, dirlen - 1);
+		strbuf_add(tmp, filename->buf, dirlen - 1);
 		if (mkdir(tmp->buf, 0777) && errno != EEXIST)
 			return -1;
 		if (adjust_shared_perm(tmp->buf))
 			return -1;
 
 		/* Try again */
-		strbuf_addstr(tmp, "/tmp_obj_XXXXXX");
-		fd = git_mkstemp_mode(tmp->buf, 0444);
+		is_retrying = 1;
+		goto retry_create;
 	}
 	return fd;
 }
@@ -1925,14 +1946,12 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	static struct strbuf filename = STRBUF_INIT;
 
-	loose_object_path(the_repository, &filename, oid);
-
-	fd = create_tmpfile(&tmp_file, filename.buf);
+	fd = create_objfile(oid, &tmp_file, &filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory());
 		else
-			return error_errno(_("unable to create temporary file"));
+			return error_errno(_("unable to create object file"));
 	}
 
 	/* Set it up */
diff --git a/object-store.h b/object-store.h
index b4dc6668aa2..f8c883a5730 100644
--- a/object-store.h
+++ b/object-store.h
@@ -26,6 +26,12 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This is a temporary object store, so there is no need to
+	 * create new objects via rename.
+	 */
+	int is_temp;
+
 	/*
 	 * Path to the alternative object store. If this is a relative path,
 	 * it is relative to the current working directory.
diff --git a/repository.c b/repository.c
index b2bf44c6faf..a16de04dfa8 100644
--- a/repository.c
+++ b/repository.c
@@ -80,6 +80,8 @@ void repo_set_gitdir(struct repository *repo,
 	expand_base_dir(&repo->objects->odb->path, o->object_dir,
 			repo->commondir, "objects");
 
+	repo->objects->odb->is_temp = o->object_dir_is_temp;
+
 	free(repo->objects->alternate_db);
 	repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
 	expand_base_dir(&repo->graft_file, o->graft_file,
diff --git a/repository.h b/repository.h
index 3740c93bc0f..d3711367a6f 100644
--- a/repository.h
+++ b/repository.h
@@ -162,6 +162,7 @@ struct set_gitdir_args {
 	const char *graft_file;
 	const char *index_file;
 	const char *alternate_db;
+	int object_dir_is_temp;
 };
 
 void repo_set_gitdir(struct repository *repo, const char *root,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                             ` (6 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index b023d9959aa..f117d62c908 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_bulk_checkin(struct bulk_checkin_state *state)
 {
@@ -260,21 +260,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-25  3:15             ` Bagas Sanjaya
  2021-09-24 23:53           ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                             ` (5 subsequent siblings)
  8 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.

When the new mode is enabled we do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin we:
1. Issue an fsync against a dummy file to flush the hardware writeback
   cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the case today,
   but may be a good extension to those components.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we
would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 29 ++++++++++++---
 Makefile                      |  6 +++
 bulk-checkin.c                | 70 +++++++++++++++++++++++++++++++++++
 bulk-checkin.h                |  2 +
 cache.h                       |  8 +++-
 config.c                      |  7 +++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 ++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 ++++
 object-file.c                 | 67 ++++++++++++++++++++++++++++++++-
 object-store.h                | 16 ++++++++
 object.c                      |  2 +-
 tmp-objdir.c                  | 20 +++++++++-
 tmp-objdir.h                  |  6 +++
 wrapper.c                     | 44 ++++++++++++++++++++++
 write-or-die.c                |  2 +-
 17 files changed, 284 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..200b4d9f06e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,29 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls loose objects in the object store, so updates to any
+	refs or the index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each loose object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware. Git does not currently fsync parent directories for
+  newly-added files, so some filesystems may still allow data to be lost on
+  system crash.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write loose object data with a minimal set of FLUSH
+  CACHE (or equivalent) commands sent to the storage controller. If the
+  operating system interfaces are not available, this mode behaves the same as
+  `true`. This mode is expected to be as safe as `true` on macOS for repos
+  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
+  ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 429c276058d..326c7607e0f 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1896,6 +1898,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/bulk-checkin.c b/bulk-checkin.c
index f117d62c908..957a6238684 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,14 +3,20 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
+#include "tmp-objdir.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
+static int needs_batch_fsync;
+
+static struct tmp_objdir *bulk_fsync_objdir;
 
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
@@ -62,6 +68,34 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+/*
+ * Cleanup after batch-mode fsync_object_files.
+ */
+static void do_batch_fsync(void)
+{
+	/*
+	 * Issue a full hardware flush against a temporary file to ensure
+	 * that all objects are durable before any renames occur.  The code in
+	 * fsync_loose_object_bulk_checkin has already issued a writeout
+	 * request, but it has not flushed any writeback cache in the storage
+	 * hardware.
+	 */
+
+	if (needs_batch_fsync) {
+		struct strbuf temp_path = STRBUF_INIT;
+		struct tempfile *temp;
+
+		strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
+		temp = xmks_tempfile(temp_path.buf);
+		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
+		delete_tempfile(&temp);
+		strbuf_release(&temp_path);
+	}
+
+	if (bulk_fsync_objdir)
+		tmp_objdir_migrate(bulk_fsync_objdir);
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -256,6 +290,26 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+void fsync_loose_object_bulk_checkin(int fd)
+{
+	assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
+
+	/*
+	 * If we have a plugged bulk checkin, we issue a call that
+	 * cleans the filesystem page cache but avoids a hardware flush
+	 * command. Later on we will issue a single hardware flush
+	 * before as part of do_batch_fsync.
+	 */
+	if (bulk_checkin_plugged &&
+	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+		assert(the_repository->objects->odb->is_temp);
+		if (!needs_batch_fsync)
+			needs_batch_fsync = 1;
+	} else {
+		fsync_or_die(fd, "loose object file");
+	}
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -270,6 +324,20 @@ int index_bulk_checkin(struct object_id *oid,
 void plug_bulk_checkin(void)
 {
 	assert(!bulk_checkin_plugged);
+
+	/*
+	 * Create a temporary object directory if the current
+	 * object directory is not already temporary.
+	 */
+	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
+	    !the_repository->objects->odb->is_temp) {
+		bulk_fsync_objdir = tmp_objdir_create();
+		if (!bulk_fsync_objdir)
+			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
+
+		tmp_objdir_replace_main_odb(bulk_fsync_objdir);
+	}
+
 	bulk_checkin_plugged = 1;
 }
 
@@ -279,4 +347,6 @@ void unplug_bulk_checkin(void)
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_batch_fsync();
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..08f292379b6 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,6 +6,8 @@
 
 #include "cache.h"
 
+void fsync_loose_object_bulk_checkin(int fd);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
diff --git a/cache.h b/cache.h
index d23de693680..d1897fe9d92 100644
--- a/cache.h
+++ b/cache.h
@@ -985,7 +985,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum fsync_object_files_mode {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum fsync_object_files_mode fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index cb4a8058bff..1b403e00241 100644
--- a/config.c
+++ b/config.c
@@ -1509,7 +1509,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (value && !strcmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else if (git_config_bool(var, value))
+			fsync_object_files = FSYNC_OBJECT_FILES_ON;
+		else
+			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..e6d482fbcc6 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index d9ba68402e9..f318d59e585 100644
--- a/environment.c
+++ b/environment.c
@@ -43,7 +43,7 @@ const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int core_compression_level;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum fsync_object_files_mode fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index b46605300ab..d14e2436276 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1210,6 +1210,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index ab593515cec..ec22560dd66 100644
--- a/object-file.c
+++ b/object-file.c
@@ -750,6 +750,60 @@ void add_to_alternates_memory(const char *reference)
 			     '\n', NULL, 0);
 }
 
+struct object_directory *set_temporary_main_odb(const char *dir)
+{
+	struct object_directory *main_odb, *new_odb, *old_next;
+
+	/*
+	 * Make sure alternates are initialized, or else our entry may be
+	 * overwritten when they are.
+	 */
+	prepare_alt_odb(the_repository);
+
+	/* Copy the existing object directory and make it an alternate. */
+	main_odb = the_repository->objects->odb;
+	new_odb = xmalloc(sizeof(*new_odb));
+	*new_odb = *main_odb;
+	*the_repository->objects->odb_tail = new_odb;
+	the_repository->objects->odb_tail = &(new_odb->next);
+	new_odb->next = NULL;
+
+	/*
+	 * Reinitialize the main odb with the specified path, being careful
+	 * to keep the next pointer value.
+	 */
+	old_next = main_odb->next;
+	memset(main_odb, 0, sizeof(*main_odb));
+	main_odb->next = old_next;
+	main_odb->is_temp = 1;
+	main_odb->path = xstrdup(dir);
+	return new_odb;
+}
+
+void restore_main_odb(struct object_directory *odb)
+{
+	struct object_directory **prev, *main_odb;
+
+	/* Unlink the saved previous main ODB from the list. */
+	prev = &the_repository->objects->odb->next;
+	assert(*prev);
+	while (*prev != odb) {
+		prev = &(*prev)->next;
+	}
+	*prev = odb->next;
+	if (*prev == NULL)
+		the_repository->objects->odb_tail = prev;
+
+	/*
+	 * Restore the data from the old main odb, being careful to
+	 * keep the next pointer value
+	 */
+	main_odb = the_repository->objects->odb;
+	SWAP(*main_odb, *odb);
+	main_odb->next = odb->next;
+	free_object_directory(odb);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
@@ -1867,8 +1921,19 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 /* Finalize a file on disk, and close it. */
 static void close_loose_object(int fd)
 {
-	if (fsync_object_files)
+	switch (fsync_object_files) {
+	case FSYNC_OBJECT_FILES_OFF:
+		break;
+	case FSYNC_OBJECT_FILES_ON:
 		fsync_or_die(fd, "loose object file");
+		break;
+	case FSYNC_OBJECT_FILES_BATCH:
+		fsync_loose_object_bulk_checkin(fd);
+		break;
+	default:
+		BUG("Invalid fsync_object_files mode.");
+	}
+
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
 }
diff --git a/object-store.h b/object-store.h
index f8c883a5730..9bea14e7f3b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -62,6 +62,19 @@ void add_to_alternates_file(const char *dir);
  */
 void add_to_alternates_memory(const char *dir);
 
+/*
+ * Replace the current main object directory with the specified temporary
+ * object directory. We make a copy of the former main object directory,
+ * add it as an in-memory alternate, and return the copy so that it can
+ * be restored via restore_main_odb.
+ */
+struct object_directory *set_temporary_main_odb(const char *dir);
+
+/*
+ * Restore a previous ODB replaced by set_temporary_main_odb.
+ */
+void restore_main_odb(struct object_directory *odb);
+
 /*
  * Populate and return the loose object cache array corresponding to the
  * given object ID.
@@ -72,6 +85,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
 /* Empty the loose object cache for the specified object directory. */
 void odb_clear_loose_cache(struct object_directory *odb);
 
+/* Clear and free the specified object directory */
+void free_object_directory(struct object_directory *odb);
+
 struct packed_git {
 	struct hashmap_entry packmap_ent;
 	struct packed_git *next;
diff --git a/object.c b/object.c
index 4e85955a941..98635bc4043 100644
--- a/object.c
+++ b/object.c
@@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
 	return o;
 }
 
-static void free_object_directory(struct object_directory *odb)
+void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index b8d880e3626..f027c49db4c 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -11,6 +11,7 @@
 struct tmp_objdir {
 	struct strbuf path;
 	struct strvec env;
+	struct object_directory *prev_main_odb;
 };
 
 /*
@@ -50,8 +51,12 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	 * freeing memory; it may cause a deadlock if the signal
 	 * arrived while libc's allocator lock is held.
 	 */
-	if (!on_signal)
+	if (!on_signal) {
+		if (t->prev_main_odb)
+			restore_main_odb(t->prev_main_odb);
 		tmp_objdir_free(t);
+	}
+
 	return err;
 }
 
@@ -132,6 +137,7 @@ struct tmp_objdir *tmp_objdir_create(void)
 	t = xmalloc(sizeof(*t));
 	strbuf_init(&t->path, 0);
 	strvec_init(&t->env);
+	t->prev_main_odb = NULL;
 
 	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
 
@@ -269,6 +275,11 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	if (!t)
 		return 0;
 
+	if (t->prev_main_odb) {
+		restore_main_odb(t->prev_main_odb);
+		t->prev_main_odb = NULL;
+	}
+
 	strbuf_addbuf(&src, &t->path);
 	strbuf_addstr(&dst, get_object_directory());
 
@@ -292,3 +303,10 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
 {
 	add_to_alternates_memory(t->path.buf);
 }
+
+void tmp_objdir_replace_main_odb(struct tmp_objdir *t)
+{
+	if (t->prev_main_odb)
+		BUG("the main object database is already replaced");
+	t->prev_main_odb = set_temporary_main_odb(t->path.buf);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
index b1e45b4c75d..4b898add05b 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -51,4 +51,10 @@ int tmp_objdir_destroy(struct tmp_objdir *);
  */
 void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
 
+/*
+ * Replaces the main object store in the current process with the temporary
+ * object directory and makes the former main object store an alternate.
+ */
+void tmp_objdir_replace_main_odb(struct tmp_objdir *);
+
 #endif /* TMP_OBJDIR_H */
diff --git a/wrapper.c b/wrapper.c
index 7c6586af321..bb4f9f043ce 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	switch (action) {
+	case FSYNC_WRITEOUT_ONLY:
+
+#ifdef __APPLE__
+		/*
+		 * on macOS, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+
+	case FSYNC_HARDWARE_FLUSH:
+
+#ifdef __APPLE__
+		return fcntl(fd, F_FULLFSYNC);
+#else
+		return fsync(fd);
+#endif
+
+	default:
+		BUG("unexpected git_fsync(%d) call", action);
+	}
+
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index d33e68f6abb..8f53953d4ab 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
                             ` (2 preceding siblings ...)
  2021-09-24 23:53           ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-27 20:07             ` Junio C Hamano
  2021-09-24 23:53           ` [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                             ` (4 subsequent siblings)
  8 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 28 ++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..6074a3d3ced 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..75324c24ee7
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,28 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index e6d482fbcc6..34c93314a50 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -451,6 +451,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..b573a5ee122 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index bb4f9f043ce..1a1e2fba9c9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
                             ` (3 preceding siblings ...)
  2021-09-24 23:53           ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 6/8] unpack-objects: " Neeraj Singh via GitGitGadget
                             ` (3 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to synchronize with the update-index
process after passing it a file path.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..dc7368bb1ee 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1088,6 +1089,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 	the_index.updated_skipworktree = 1;
 
+	/* we might be adding many objects to the object database */
+	plug_bulk_checkin();
+
 	/*
 	 * Custom copy of parse_options() because we want to handle
 	 * filename arguments as they come.
@@ -1168,6 +1172,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	/* by now we must have added all of the new objects */
+	unplug_bulk_checkin();
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 6/8] unpack-objects: use the bulk-checkin infrastructure
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
                             ` (4 preceding siblings ...)
  2021-09-24 23:53           ` [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
                             ` (2 subsequent siblings)
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/unpack-objects.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 4a9466295ba..51eb4f7b531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "object-store.h"
 #include "object.h"
@@ -503,10 +504,12 @@ static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
+	plug_bulk_checkin();
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
 	}
+	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
                             ` (5 preceding siblings ...)
  2021-09-24 23:53           ` [PATCH v6 6/8] unpack-objects: " Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-24 23:53           ` [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add test cases to exercise batch mode for:
 * 'git add'
 * 'git stash'
 * 'git update-index'
 * 'git unpack-objects'

These tests ensure that the added data winds up in the object database.

In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added due
to it already being in the repo.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/lib-unique-files.sh  | 36 ++++++++++++++++++++++++++++++++++++
 t/t3700-add.sh         | 20 ++++++++++++++++++++
 t/t3903-stash.sh       | 14 ++++++++++++++
 t/t5300-pack-object.sh | 30 +++++++++++++++++++-----------
 4 files changed, 89 insertions(+), 11 deletions(-)
 create mode 100644 t/lib-unique-files.sh

diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
new file mode 100644
index 00000000000..a7de4ca8512
--- /dev/null
+++ b/t/lib-unique-files.sh
@@ -0,0 +1,36 @@
+# Helper to create files with unique contents
+
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
+#					 each in my_dir, all with unique
+#					 contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+	local counter=0
+	test_tick
+	local basedata=$test_tick
+
+
+	rm -rf $basedir
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir"
+		for j in $(test_seq $files)
+		do
+			counter=$((counter + 1))
+			echo "$basedata.$counter"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4086e1ebbc9..36049a53ff7 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.'
 
 . ./test-lib.sh
 
+. $TEST_DIRECTORY/lib-unique-files.sh
+
 # Test the file mode "$1" of the file "$2" in the index.
 test_mode_in_index () {
 	case "$(git ls-files -s "$2")" in
@@ -33,6 +35,24 @@ test_expect_success \
     'Test that "git add -- -q" works' \
     'touch -- -q && git add -- -q'
 
+test_expect_success 'git add: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+	git ls-files --stage fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files2 &&
+	find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
+	rm -f fsynced_files2 &&
+	git ls-files --stage fsync-files2/ > fsynced_files2 &&
+	test_line_count = 8 fsynced_files2 &&
+	awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
+"
+
 test_expect_success \
 	'git add: Test that executable bit is not used if core.filemode=0' \
 	'git config core.filemode 0 &&
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 873aa56e359..2fc819e5584 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. $TEST_DIRECTORY/lib-unique-files.sh
 
 diff_cmp () {
 	for i in "$1" "$2"
@@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' '
 	git rev-parse --verify refs/stash:A.t
 '
 
+test_expect_success 'stash with core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+
+	# The files were untracked, so use the third parent,
+	# which contains the untracked files
+	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+
 test_expect_success 'stash -c stash.useBuiltin=false warning ' '
 	expected="stash.useBuiltin support has been removed" &&
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index e13a8842075..38663dc1393 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -162,23 +162,23 @@ test_expect_success 'pack-objects with bogus arguments' '
 
 check_unpack () {
 	test_when_finished "rm -rf git2" &&
-	git init --bare git2 &&
-	git -C git2 unpack-objects -n <"$1".pack &&
-	git -C git2 unpack-objects <"$1".pack &&
-	(cd .git && find objects -type f -print) |
-	while read path
-	do
-		cmp git2/$path .git/$path || {
-			echo $path differs.
-			return 1
-		}
-	done
+	git $2 init --bare git2 &&
+	(
+		git $2 -C git2 unpack-objects -n <"$1".pack &&
+		git $2 -C git2 unpack-objects <"$1".pack &&
+		git $2 -C git2 cat-file --batch-check="%(objectname)"
+	) <obj-list >current &&
+	cmp obj-list current
 }
 
 test_expect_success 'unpack without delta' '
 	check_unpack test-1-${packname_1}
 '
 
+test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
+	check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with REF_DELTA' '
 	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
 	check_deltas stderr -gt 0
@@ -188,6 +188,10 @@ test_expect_success 'unpack with REF_DELTA' '
 	check_unpack test-2-${packname_2}
 '
 
+test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with OFS_DELTA' '
 	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
 			<obj-list 2>stderr) &&
@@ -198,6 +202,10 @@ test_expect_success 'unpack with OFS_DELTA' '
 	check_unpack test-3-${packname_3}
 '
 
+test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'compare delta flavors' '
 	perl -e '\''
 		defined($_ = -s $_) or die for @ARGV;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
                             ` (6 preceding siblings ...)
  2021-09-24 23:53           ` [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-24 23:53           ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  8 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-24 23:53 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/p3700-add.sh   | 43 ++++++++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..e93c08a2e70
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..c9fcd0c03eb
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes
  2021-09-24 23:53           ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-25  3:15             ` Bagas Sanjaya
  2021-09-27  0:27               ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Bagas Sanjaya @ 2021-09-25  3:15 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget, git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Neeraj K. Singh

On 25/09/21 06.53, Neeraj Singh via GitGitGadget wrote:
> At the end of the entire transaction when unplugging bulk checkin we:
> 1. Issue an fsync against a dummy file to flush the hardware writeback
>     cache, which should by now have processed the tmp-objdir writes.
> 2. Rename all of the tmp-objdir files to their final names.
> 3. When updating the index and/or refs, we assume that Git will issue
>     another fsync internal to that operation. This is not the case today,
>     but may be a good extension to those components.

The 'we' can be stripped because only point 1 and 2 that are 
subject-inferred, so that subject needs to be explicitly mentioned, like:

```
At the end of ... <snip>.:
1. We issue an fsync ... <snip>.
2. We rename ... <snip>.
3. When ... <snip>, we assume <snip>. (stays same)
```

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes
  2021-09-25  3:15             ` Bagas Sanjaya
@ 2021-09-27  0:27               ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-27  0:27 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Neeraj K. Singh

On Fri, Sep 24, 2021 at 8:15 PM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On 25/09/21 06.53, Neeraj Singh via GitGitGadget wrote:
> > At the end of the entire transaction when unplugging bulk checkin we:
> > 1. Issue an fsync against a dummy file to flush the hardware writeback
> >     cache, which should by now have processed the tmp-objdir writes.
> > 2. Rename all of the tmp-objdir files to their final names.
> > 3. When updating the index and/or refs, we assume that Git will issue
> >     another fsync internal to that operation. This is not the case today,
> >     but may be a good extension to those components.
>
> The 'we' can be stripped because only point 1 and 2 that are
> subject-inferred, so that subject needs to be explicitly mentioned, like:
>
> ```
> At the end of ... <snip>.:
> 1. We issue an fsync ... <snip>.
> 2. We rename ... <snip>.
> 3. When ... <snip>, we assume <snip>. (stays same)
> ```

I'll fix this in the github PR so that it will ride along with any
other re-roll.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-24 23:53           ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-27 20:07             ` Junio C Hamano
  2021-09-27 20:55               ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Junio C Hamano @ 2021-09-27 20:07 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

"Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/compat/mingw.h b/compat/mingw.h
> index c9a52ad64a6..6074a3d3ced 100644
> --- a/compat/mingw.h
> +++ b/compat/mingw.h
> @@ -329,6 +329,9 @@ int mingw_getpagesize(void);
>  #define getpagesize mingw_getpagesize
>  #endif
>  
> +int win32_fsync_no_flush(int fd);
> +#define fsync_no_flush win32_fsync_no_flush

...

> diff --git a/wrapper.c b/wrapper.c
> index bb4f9f043ce..1a1e2fba9c9 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
>  						 SYNC_FILE_RANGE_WAIT_AFTER);
>  #endif
>  
> +#ifdef fsync_no_flush
> +		return fsync_no_flush(fd);
> +#endif
> +
>  		errno = ENOSYS;
>  		return -1;

This almost makes me wonder if we want to have a fallback
implementation of fsync_no_flush() that does

   int fsync_no_flush(int unused)
   {
	errno = ENOSYS;
	return -1;
   }

when nobody (like Windows) define their own fsync_no_flush().  That
way, this codepath does not have to have #ifdef/#endif here.

This function is already #ifdef ridden anyway, so reducing just one
instance may not make much difference, but since I noticed it ...

Thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-27 20:07             ` Junio C Hamano
@ 2021-09-27 20:55               ` Neeraj Singh
  2021-09-27 21:03                 ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-27 20:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

On Mon, Sep 27, 2021 at 1:07 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > diff --git a/compat/mingw.h b/compat/mingw.h
> > index c9a52ad64a6..6074a3d3ced 100644
> > --- a/compat/mingw.h
> > +++ b/compat/mingw.h
> > @@ -329,6 +329,9 @@ int mingw_getpagesize(void);
> >  #define getpagesize mingw_getpagesize
> >  #endif
> >
> > +int win32_fsync_no_flush(int fd);
> > +#define fsync_no_flush win32_fsync_no_flush
>
> ...
>
> > diff --git a/wrapper.c b/wrapper.c
> > index bb4f9f043ce..1a1e2fba9c9 100644
> > --- a/wrapper.c
> > +++ b/wrapper.c
> > @@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
> >                                                SYNC_FILE_RANGE_WAIT_AFTER);
> >  #endif
> >
> > +#ifdef fsync_no_flush
> > +             return fsync_no_flush(fd);
> > +#endif
> > +
> >               errno = ENOSYS;
> >               return -1;
>
> This almost makes me wonder if we want to have a fallback
> implementation of fsync_no_flush() that does
>
>    int fsync_no_flush(int unused)
>    {
>         errno = ENOSYS;
>         return -1;
>    }
>
> when nobody (like Windows) define their own fsync_no_flush().  That
> way, this codepath does not have to have #ifdef/#endif here.
>
> This function is already #ifdef ridden anyway, so reducing just one
> instance may not make much difference, but since I noticed it ...
>
> Thanks.

I'll make your suggested change on Github so that it will be available if
we do another re-roll.

Thanks,
Nereaj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-27 20:55               ` Neeraj Singh
@ 2021-09-27 21:03                 ` Neeraj Singh
  2021-09-27 23:53                   ` Junio C Hamano
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-09-27 21:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

On Mon, Sep 27, 2021 at 1:55 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> On Mon, Sep 27, 2021 at 1:07 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > "Neeraj Singh via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> > > diff --git a/compat/mingw.h b/compat/mingw.h
> > > index c9a52ad64a6..6074a3d3ced 100644
> > > --- a/compat/mingw.h
> > > +++ b/compat/mingw.h
> > > @@ -329,6 +329,9 @@ int mingw_getpagesize(void);
> > >  #define getpagesize mingw_getpagesize
> > >  #endif
> > >
> > > +int win32_fsync_no_flush(int fd);
> > > +#define fsync_no_flush win32_fsync_no_flush
> >
> > ...
> >
> > > diff --git a/wrapper.c b/wrapper.c
> > > index bb4f9f043ce..1a1e2fba9c9 100644
> > > --- a/wrapper.c
> > > +++ b/wrapper.c
> > > @@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
> > >                                                SYNC_FILE_RANGE_WAIT_AFTER);
> > >  #endif
> > >
> > > +#ifdef fsync_no_flush
> > > +             return fsync_no_flush(fd);
> > > +#endif
> > > +
> > >               errno = ENOSYS;
> > >               return -1;
> >
> > This almost makes me wonder if we want to have a fallback
> > implementation of fsync_no_flush() that does
> >
> >    int fsync_no_flush(int unused)
> >    {
> >         errno = ENOSYS;
> >         return -1;
> >    }
> >
> > when nobody (like Windows) define their own fsync_no_flush().  That
> > way, this codepath does not have to have #ifdef/#endif here.
> >
> > This function is already #ifdef ridden anyway, so reducing just one
> > instance may not make much difference, but since I noticed it ...
> >
> > Thanks.
>
> I'll make your suggested change on Github so that it will be available if
> we do another re-roll.
>
> Thanks,
> Nereaj

Actually, while trying your suggestion, my conclusion is that we'd
either have the
inverse ifdef around the fsync_no_flush fallback or an #undef, or some
other confusing
state.  The current ifdeffery is unpleasant to read but not too long
and also pretty direct.
Win32 has an extra level of indirection, but the unix platforms
syscalls are directly written
in one place.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-27 21:03                 ` Neeraj Singh
@ 2021-09-27 23:53                   ` Junio C Hamano
  0 siblings, 0 replies; 160+ messages in thread
From: Junio C Hamano @ 2021-09-27 23:53 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

Neeraj Singh <nksingh85@gmail.com> writes:

> ....  The current ifdeffery is unpleasant to read but not too long
> and also pretty direct.
> Win32 has an extra level of indirection, but the unix platforms
> syscalls are directly written
> in one place.

Yes, that is exactly why I concluded that reducing just one instance
would not make that much difference ;-)

Thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
                             ` (7 preceding siblings ...)
  2021-09-24 23:53           ` [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32           ` Neeraj K. Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
                               ` (9 more replies)
  8 siblings, 10 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

Thanks to everyone for review so far!

The patch is now at version 7: changes since v6:

 * Rebased onto current upstream master

 * Separate the tmp-objdir changes and move to the beginning of the series
   so that Elijah Newren's similar changes can be merged.

 * Use some of Elijah's implementation for replacing the primary ODB. I was
   doing some unnecessarily complex copying for no good reason.

 * Make the tmp objdir code use a name beginning with tmp_ and having a
   operation-specific prefix.

 * Add git-prune support for removing a stale object directory.

v5 was a bit of a dud, with some issues that I only noticed after
submitting. v6 changes:

 * re-add Windows support
 * fix minor formatting issues
 * reset git author and commit dates which got messed up

Changes since v4, all in response to review feedback from Ævar Arnfjörð
Bjarmason:

 * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
   to add a statement about not fsyncing parent directories.
   
   * I still don't want to make any promises on behalf of the Linux FS developers
     in the documentation. However, according to [v4.1] and my understanding
     of how XFS journals are documented to work, it looks like recent versions
     of Linux running on XFS should be as safe as Windows or macOS in 'batch'
     mode. I don't know about ext4, since it's not clear to me when metadata
     updates are made visible to the journal.
   

 * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
   pointed out, this lets us access the added loose objects immediately,
   rather than only after unplugging the bulk checkin. This is a hard
   requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
   
   * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
     tmp objdir (as determined by the quarantine environment variable).
   
   * I added support to the tmp-objdir lib to replace the 'main' writable odb.
   
   * Instead of using a lockfile for the final full fsync, we now use a new dummy
     temp file. Doing that makes the below unpack-objects change easier.
   

 * Add bulk-checkin support to unpack-objects, which is used in fetch and
   push. In addition to making those operations faster, it allows us to
   directly compare performance of packfiles against loose objects. Please
   see [v4.2] for a measurement of 'git push' to a local upstream with
   different numbers of unique new files.

 * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.

 * Remove comment with link to NtFlushBuffersFileEx documentation.

 * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
   contents, but now this uses test_tick, so it should be deterministic from
   run to run.

 * Ensure there are tests for all of the modified commands. Make the
   unpack-objects tests validate that the unpacked objects are really
   available in the ODB.

References for v4: [v4.1]
https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/#t

[v4.2]
https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117

Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Neeraj Singh (9):
  object-file.c: do not rename in a temp odb
  tmp-objdir: new API for creating temporary writable databases
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       |  29 ++++++--
 Makefile                            |   6 ++
 builtin/prune.c                     |  22 ++++--
 builtin/receive-pack.c              |   2 +-
 builtin/unpack-objects.c            |   3 +
 builtin/update-index.c              |   6 ++
 bulk-checkin.c                      |  92 +++++++++++++++++++++---
 bulk-checkin.h                      |   2 +
 cache.h                             |   8 ++-
 compat/mingw.h                      |   3 +
 compat/win32/flush.c                |  28 ++++++++
 config.c                            |   7 +-
 config.mak.uname                    |   3 +
 configure.ac                        |   8 +++
 contrib/buildsystems/CMakeLists.txt |   3 +-
 environment.c                       |   6 +-
 git-compat-util.h                   |   7 ++
 object-file.c                       | 106 +++++++++++++++++++++++-----
 object-store.h                      |  25 +++++++
 object.c                            |   2 +-
 repository.c                        |   2 +
 repository.h                        |   1 +
 t/lib-unique-files.sh               |  36 ++++++++++
 t/perf/p3700-add.sh                 |  43 +++++++++++
 t/perf/p3900-stash.sh               |  46 ++++++++++++
 t/t3700-add.sh                      |  20 ++++++
 t/t3903-stash.sh                    |  14 ++++
 t/t5300-pack-object.sh              |  30 +++++---
 tmp-objdir.c                        |  30 +++++++-
 tmp-objdir.h                        |  14 +++-
 wrapper.c                           |  48 +++++++++++++
 write-or-die.c                      |   2 +-
 32 files changed, 592 insertions(+), 62 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v7
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v6:

  1:  e4081f81f6a =  1:  6e65f68fd6d object-file.c: do not rename in a temp odb
  -:  ----------- >  2:  6ce72a709a1 tmp-objdir: new API for creating temporary writable databases
  2:  ebba65e040c !  3:  c272f8776fa bulk-checkin: rename 'state' variable and separate 'plugged' boolean
     @@ bulk-checkin.c: static struct bulk_checkin_state {
      -} state;
      +} bulk_checkin_state;
       
     - static void finish_bulk_checkin(struct bulk_checkin_state *state)
     - {
     + static void finish_tmp_packfile(struct strbuf *basename,
     + 				const char *pack_tmp_name,
      @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
       		       int fd, size_t size, enum object_type type,
       		       const char *path, unsigned flags)
  3:  543ea356934 !  4:  55556bb3e90 core.fsyncobjectfiles: batched disk flushes
     @@ Commit message
          batches up hardware flushes. It hooks into the bulk-checkin plugging and
          unplugging functionality and takes advantage of tmp-objdir.
      
     -    When the new mode is enabled we do the following for each new object:
     +    When the new mode is enabled do the following for each new object:
          1. Create the object in a tmp-objdir.
          2. Issue a pagecache writeback request and wait for it to complete.
      
     -    At the end of the entire transaction when unplugging bulk checkin we:
     +    At the end of the entire transaction when unplugging bulk checkin:
          1. Issue an fsync against a dummy file to flush the hardware writeback
             cache, which should by now have processed the tmp-objdir writes.
          2. Rename all of the tmp-objdir files to their final names.
     @@ Commit message
             but may be a good extension to those components.
      
          On a filesystem with a singular journal that is updated during name
     -    operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we
     -    would expect the fsync to trigger a journal writeout so that this
     +    operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
     +    we would expect the fsync to trigger a journal writeout so that this
          sequence is enough to ensure that the user's data is durable by the time
          the git command returns.
      
     @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
      +	 */
      +	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
      +	    !the_repository->objects->odb->is_temp) {
     -+		bulk_fsync_objdir = tmp_objdir_create();
     ++		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
      +		if (!bulk_fsync_objdir)
      +			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
      +
     -+		tmp_objdir_replace_main_odb(bulk_fsync_objdir);
     ++		tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
      +	}
      +
       	bulk_checkin_plugged = 1;
     @@ configure.ac: AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
       GIT_CHECK_FUNC(setitimer,
      
       ## environment.c ##
     -@@ environment.c: const char *git_hooks_path;
     +@@ environment.c: const char *git_attributes_file;
     + const char *git_hooks_path;
       int zlib_compression_level = Z_BEST_SPEED;
     - int core_compression_level;
       int pack_compression_level = Z_DEFAULT_COMPRESSION;
      -int fsync_object_files;
      +enum fsync_object_files_mode fsync_object_files;
     @@ git-compat-util.h: __attribute__((format (printf, 1, 2))) NORETURN
        * Returns 0 on success, which includes trying to unlink an object that does
      
       ## object-file.c ##
     -@@ object-file.c: void add_to_alternates_memory(const char *reference)
     - 			     '\n', NULL, 0);
     - }
     - 
     -+struct object_directory *set_temporary_main_odb(const char *dir)
     -+{
     -+	struct object_directory *main_odb, *new_odb, *old_next;
     -+
     -+	/*
     -+	 * Make sure alternates are initialized, or else our entry may be
     -+	 * overwritten when they are.
     -+	 */
     -+	prepare_alt_odb(the_repository);
     -+
     -+	/* Copy the existing object directory and make it an alternate. */
     -+	main_odb = the_repository->objects->odb;
     -+	new_odb = xmalloc(sizeof(*new_odb));
     -+	*new_odb = *main_odb;
     -+	*the_repository->objects->odb_tail = new_odb;
     -+	the_repository->objects->odb_tail = &(new_odb->next);
     -+	new_odb->next = NULL;
     -+
     -+	/*
     -+	 * Reinitialize the main odb with the specified path, being careful
     -+	 * to keep the next pointer value.
     -+	 */
     -+	old_next = main_odb->next;
     -+	memset(main_odb, 0, sizeof(*main_odb));
     -+	main_odb->next = old_next;
     -+	main_odb->is_temp = 1;
     -+	main_odb->path = xstrdup(dir);
     -+	return new_odb;
     -+}
     -+
     -+void restore_main_odb(struct object_directory *odb)
     -+{
     -+	struct object_directory **prev, *main_odb;
     -+
     -+	/* Unlink the saved previous main ODB from the list. */
     -+	prev = &the_repository->objects->odb->next;
     -+	assert(*prev);
     -+	while (*prev != odb) {
     -+		prev = &(*prev)->next;
     -+	}
     -+	*prev = odb->next;
     -+	if (*prev == NULL)
     -+		the_repository->objects->odb_tail = prev;
     -+
     -+	/*
     -+	 * Restore the data from the old main odb, being careful to
     -+	 * keep the next pointer value
     -+	 */
     -+	main_odb = the_repository->objects->odb;
     -+	SWAP(*main_odb, *odb);
     -+	main_odb->next = odb->next;
     -+	free_object_directory(odb);
     -+}
     -+
     - /*
     -  * Compute the exact path an alternate is at and returns it. In case of
     -  * error NULL is returned and the human readable error is added to `err`
      @@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void *buf,
     - /* Finalize a file on disk, and close it. */
       static void close_loose_object(int fd)
       {
     --	if (fsync_object_files)
     -+	switch (fsync_object_files) {
     -+	case FSYNC_OBJECT_FILES_OFF:
     -+		break;
     -+	case FSYNC_OBJECT_FILES_ON:
     - 		fsync_or_die(fd, "loose object file");
     -+		break;
     -+	case FSYNC_OBJECT_FILES_BATCH:
     -+		fsync_loose_object_bulk_checkin(fd);
     -+		break;
     -+	default:
     -+		BUG("Invalid fsync_object_files mode.");
     -+	}
     -+
     - 	if (close(fd) != 0)
     - 		die_errno(_("error when closing loose object file"));
     - }
     -
     - ## object-store.h ##
     -@@ object-store.h: void add_to_alternates_file(const char *dir);
     -  */
     - void add_to_alternates_memory(const char *dir);
     - 
     -+/*
     -+ * Replace the current main object directory with the specified temporary
     -+ * object directory. We make a copy of the former main object directory,
     -+ * add it as an in-memory alternate, and return the copy so that it can
     -+ * be restored via restore_main_odb.
     -+ */
     -+struct object_directory *set_temporary_main_odb(const char *dir);
     -+
     -+/*
     -+ * Restore a previous ODB replaced by set_temporary_main_odb.
     -+ */
     -+void restore_main_odb(struct object_directory *odb);
     -+
     - /*
     -  * Populate and return the loose object cache array corresponding to the
     -  * given object ID.
     -@@ object-store.h: struct oidtree *odb_loose_cache(struct object_directory *odb,
     - /* Empty the loose object cache for the specified object directory. */
     - void odb_clear_loose_cache(struct object_directory *odb);
     - 
     -+/* Clear and free the specified object directory */
     -+void free_object_directory(struct object_directory *odb);
     -+
     - struct packed_git {
     - 	struct hashmap_entry packmap_ent;
     - 	struct packed_git *next;
     -
     - ## object.c ##
     -@@ object.c: struct raw_object_store *raw_object_store_new(void)
     - 	return o;
     - }
     + 	if (!the_repository->objects->odb->will_destroy) {
     +-		if (fsync_object_files)
     ++		switch (fsync_object_files) {
     ++		case FSYNC_OBJECT_FILES_OFF:
     ++			break;
     ++		case FSYNC_OBJECT_FILES_ON:
     + 			fsync_or_die(fd, "loose object file");
     ++			break;
     ++		case FSYNC_OBJECT_FILES_BATCH:
     ++			fsync_loose_object_bulk_checkin(fd);
     ++			break;
     ++		default:
     ++			BUG("Invalid fsync_object_files mode.");
     ++		}
     + 	}
       
     --static void free_object_directory(struct object_directory *odb)
     -+void free_object_directory(struct object_directory *odb)
     - {
     - 	free(odb->path);
     - 	odb_clear_loose_cache(odb);
     + 	if (close(fd) != 0)
      
       ## tmp-objdir.c ##
     -@@
     - struct tmp_objdir {
     - 	struct strbuf path;
     - 	struct strvec env;
     -+	struct object_directory *prev_main_odb;
     - };
     - 
     - /*
     -@@ tmp-objdir.c: static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
     - 	 * freeing memory; it may cause a deadlock if the signal
     - 	 * arrived while libc's allocator lock is held.
     - 	 */
     --	if (!on_signal)
     -+	if (!on_signal) {
     -+		if (t->prev_main_odb)
     -+			restore_main_odb(t->prev_main_odb);
     - 		tmp_objdir_free(t);
     -+	}
     -+
     - 	return err;
     - }
     - 
     -@@ tmp-objdir.c: struct tmp_objdir *tmp_objdir_create(void)
     - 	t = xmalloc(sizeof(*t));
     - 	strbuf_init(&t->path, 0);
     - 	strvec_init(&t->env);
     -+	t->prev_main_odb = NULL;
     - 
     - 	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
     - 
      @@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
       	if (!t)
       		return 0;
       
     -+	if (t->prev_main_odb) {
     -+		restore_main_odb(t->prev_main_odb);
     -+		t->prev_main_odb = NULL;
     -+	}
     -+
     - 	strbuf_addbuf(&src, &t->path);
     - 	strbuf_addstr(&dst, get_object_directory());
     - 
     -@@ tmp-objdir.c: void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
     - {
     - 	add_to_alternates_memory(t->path.buf);
     - }
     -+
     -+void tmp_objdir_replace_main_odb(struct tmp_objdir *t)
     -+{
     -+	if (t->prev_main_odb)
     -+		BUG("the main object database is already replaced");
     -+	t->prev_main_odb = set_temporary_main_odb(t->path.buf);
     -+}
     -
     - ## tmp-objdir.h ##
     -@@ tmp-objdir.h: int tmp_objdir_destroy(struct tmp_objdir *);
     -  */
     - void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
     - 
     -+/*
     -+ * Replaces the main object store in the current process with the temporary
     -+ * object directory and makes the former main object store an alternate.
     -+ */
     -+void tmp_objdir_replace_main_odb(struct tmp_objdir *);
     -+
     - #endif /* TMP_OBJDIR_H */
     +-
     +-
     + 	if (t->prev_odb) {
     + 		if (the_repository->objects->odb->will_destroy)
     + 			BUG("migrating and ODB that was marked for destruction");
      
       ## wrapper.c ##
      @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
  4:  bdb99822f8c =  5:  6c33e79d6f0 core.fsyncobjectfiles: add windows support for batch mode
  5:  92e18cedab0 =  6:  09dbff1004e update-index: use the bulk-checkin infrastructure
  6:  e3c5a11f225 =  7:  1eced9f9f9a unpack-objects: use the bulk-checkin infrastructure
  7:  385199354fa =  8:  7aaa08d5f5f core.fsyncobjectfiles: tests for batch mode
  8:  504bcc95c56 =  9:  ff286fb461a core.fsyncobjectfiles: performance tests for add and stash

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v7 1/9] object-file.c: do not rename in a temp odb
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:55               ` Jeff King
  2021-09-28 23:32             ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
                               ` (8 subsequent siblings)
  9 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH
being set, create object files with their final names. This avoids
an extra rename beyond what is needed to merge the temporary ODB in
tmp_objdir_migrate.

Creating an object file with the expected final name should be okay
since the git process writing to the temporary object store is the
only writer, and it only invokes write_loose_object/create_object_file
after checking that the object doesn't exist.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 environment.c  |  4 ++++
 object-file.c  | 51 ++++++++++++++++++++++++++++++++++----------------
 object-store.h |  6 ++++++
 repository.c   |  2 ++
 repository.h   |  1 +
 5 files changed, 48 insertions(+), 16 deletions(-)

diff --git a/environment.c b/environment.c
index b4ba4fa22db..30fca67e6d6 100644
--- a/environment.c
+++ b/environment.c
@@ -176,6 +176,10 @@ void setup_git_env(const char *git_dir)
 	args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT);
 	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
 	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
+	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+		args.object_dir_is_temp = 1;
+	}
+
 	repo_set_gitdir(the_repository, git_dir, &args);
 	strvec_clear(&to_free);
 
diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..49c53f801f7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1826,12 +1826,17 @@ static void write_object_file_prepare(const struct git_hash_algo *algo,
 }
 
 /*
- * Move the just written object into its final resting place.
+ * Move the just written object into its final resting place,
+ * unless it is already there, as indicated by an empty string for
+ * tmpfile.
  */
 int finalize_object_file(const char *tmpfile, const char *filename)
 {
 	int ret = 0;
 
+	if (!*tmpfile)
+		goto out;
+
 	if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
 		goto try_rename;
 	else if (link(tmpfile, filename))
@@ -1904,21 +1909,37 @@ static inline int directory_size(const char *filename)
 }
 
 /*
- * This creates a temporary file in the same directory as the final
- * 'filename'
+ * This creates a loose object file for the specified object id.
+ * If we're working in a temporary object directory, the file is
+ * created with its final filename, otherwise it is created with
+ * a temporary name and renamed by finalize_object_file.
+ * If no rename is required, an empty string is returned in tmp.
  *
  * We want to avoid cross-directory filename renames, because those
  * can have problems on various filesystems (FAT, NFS, Coda).
  */
-static int create_tmpfile(struct strbuf *tmp, const char *filename)
+static int create_objfile(const struct object_id *oid, struct strbuf *tmp,
+			  struct strbuf *filename)
 {
-	int fd, dirlen = directory_size(filename);
+	int fd, dirlen, is_retrying = 0;
+	const char *object_name;
+	static const int object_mode = 0444;
 
+	loose_object_path(the_repository, filename, oid);
+	dirlen = directory_size(filename->buf);
+
+retry_create:
 	strbuf_reset(tmp);
-	strbuf_add(tmp, filename, dirlen);
-	strbuf_addstr(tmp, "tmp_obj_XXXXXX");
-	fd = git_mkstemp_mode(tmp->buf, 0444);
-	if (fd < 0 && dirlen && errno == ENOENT) {
+	if (!the_repository->objects->odb->is_temp) {
+		strbuf_add(tmp, filename->buf, dirlen);
+		object_name = "tmp_obj_XXXXXX";
+		strbuf_addstr(tmp, object_name);
+		fd = git_mkstemp_mode(tmp->buf, object_mode);
+	} else {
+		fd = open(filename->buf, O_CREAT | O_EXCL | O_RDWR, object_mode);
+	}
+
+	if (fd < 0 && dirlen && errno == ENOENT && !is_retrying) {
 		/*
 		 * Make sure the directory exists; note that the contents
 		 * of the buffer are undefined after mkstemp returns an
@@ -1926,15 +1947,15 @@ static int create_tmpfile(struct strbuf *tmp, const char *filename)
 		 * scratch.
 		 */
 		strbuf_reset(tmp);
-		strbuf_add(tmp, filename, dirlen - 1);
+		strbuf_add(tmp, filename->buf, dirlen - 1);
 		if (mkdir(tmp->buf, 0777) && errno != EEXIST)
 			return -1;
 		if (adjust_shared_perm(tmp->buf))
 			return -1;
 
 		/* Try again */
-		strbuf_addstr(tmp, "/tmp_obj_XXXXXX");
-		fd = git_mkstemp_mode(tmp->buf, 0444);
+		is_retrying = 1;
+		goto retry_create;
 	}
 	return fd;
 }
@@ -1951,14 +1972,12 @@ static int write_loose_object(const struct object_id *oid, char *hdr,
 	static struct strbuf tmp_file = STRBUF_INIT;
 	static struct strbuf filename = STRBUF_INIT;
 
-	loose_object_path(the_repository, &filename, oid);
-
-	fd = create_tmpfile(&tmp_file, filename.buf);
+	fd = create_objfile(oid, &tmp_file, &filename);
 	if (fd < 0) {
 		if (errno == EACCES)
 			return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory());
 		else
-			return error_errno(_("unable to create temporary file"));
+			return error_errno(_("unable to create object file"));
 	}
 
 	/* Set it up */
diff --git a/object-store.h b/object-store.h
index c5130d8baea..551639f173d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -27,6 +27,12 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This is a temporary object store, so there is no need to
+	 * create new objects via rename.
+	 */
+	int is_temp;
+
 	/*
 	 * Path to the alternative object store. If this is a relative path,
 	 * it is relative to the current working directory.
diff --git a/repository.c b/repository.c
index 710a3b4bf87..75966153b75 100644
--- a/repository.c
+++ b/repository.c
@@ -80,6 +80,8 @@ void repo_set_gitdir(struct repository *repo,
 	expand_base_dir(&repo->objects->odb->path, o->object_dir,
 			repo->commondir, "objects");
 
+	repo->objects->odb->is_temp = o->object_dir_is_temp;
+
 	free(repo->objects->alternate_db);
 	repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
 	expand_base_dir(&repo->graft_file, o->graft_file,
diff --git a/repository.h b/repository.h
index 3740c93bc0f..d3711367a6f 100644
--- a/repository.h
+++ b/repository.h
@@ -162,6 +162,7 @@ struct set_gitdir_args {
 	const char *graft_file;
 	const char *index_file;
 	const char *alternate_db;
+	int object_dir_is_temp;
 };
 
 void repo_set_gitdir(struct repository *repo, const char *root,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-29  8:41               ` Elijah Newren
  2021-09-28 23:32             ` [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                               ` (7 subsequent siblings)
  9 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This patch is based on work by Elijah Newren. Any bugs however are my
own.

The tmp_objdir API provides the ability to create temporary object
directories, but was designed with the goal of having subprocesses
access these object stores, followed by the main process migrating
objects from it to the main object store or just deleting it.  The
subprocesses would view it as their primary datastore and write to it.

Here we add the tmp_objdir_replace_primary_odb function that replaces
the current process's writable "main" object directory with the
specified one. The previous main object directory is restored in either
tmp_objdir_migrate or tmp_objdir_destroy.

For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
object_database` to mark ephemeral object databases that do not require
fsync durability.

Add 'git prune' support for removing temporary object databases, and
make sure that they have a name starting with tmp_ and containing an
operation-specific name.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/prune.c        | 22 +++++++++++++++++----
 builtin/receive-pack.c |  2 +-
 object-file.c          | 45 ++++++++++++++++++++++++++++++++++++++++--
 object-store.h         | 21 +++++++++++++++++++-
 object.c               |  2 +-
 tmp-objdir.c           | 32 +++++++++++++++++++++++++++---
 tmp-objdir.h           | 14 ++++++++++---
 7 files changed, 123 insertions(+), 15 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 02c6ab7cbaa..9c72ecf5a58 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -18,6 +18,7 @@ static int show_only;
 static int verbose;
 static timestamp_t expire;
 static int show_progress = -1;
+static struct strbuf remove_dir_buf = STRBUF_INIT;
 
 static int prune_tmp_file(const char *fullpath)
 {
@@ -26,10 +27,19 @@ static int prune_tmp_file(const char *fullpath)
 		return error("Could not stat '%s'", fullpath);
 	if (st.st_mtime > expire)
 		return 0;
-	if (show_only || verbose)
-		printf("Removing stale temporary file %s\n", fullpath);
-	if (!show_only)
-		unlink_or_warn(fullpath);
+	if (S_ISDIR(st.st_mode)) {
+		if (show_only || verbose)
+			printf("Removing stale temporary directory %s\n", fullpath);
+		if (!show_only) {
+			strbuf_addstr(&remove_dir_buf, fullpath);
+			remove_dir_recursively(&remove_dir_buf, 0);
+		}
+	} else {
+		if (show_only || verbose)
+			printf("Removing stale temporary file %s\n", fullpath);
+		if (!show_only)
+			unlink_or_warn(fullpath);
+	}
 	return 0;
 }
 
@@ -97,6 +107,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
 
 static int prune_subdir(unsigned int nr, const char *path, void *data)
 {
+	if (verbose)
+		printf("Removing directory %s\n", path);
+
 	if (!show_only)
 		rmdir(path);
 	return 0;
@@ -185,5 +198,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
 	}
 
+	strbuf_release(&remove_dir_buf);
 	return 0;
 }
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 48960a9575b..418a42ca069 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2208,7 +2208,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		strvec_push(&child.args, alt_shallow_file);
 	}
 
-	tmp_objdir = tmp_objdir_create();
+	tmp_objdir = tmp_objdir_create("incoming");
 	if (!tmp_objdir) {
 		if (err_fd > 0)
 			close(err_fd);
diff --git a/object-file.c b/object-file.c
index 49c53f801f7..1a3ad558c45 100644
--- a/object-file.c
+++ b/object-file.c
@@ -751,6 +751,44 @@ void add_to_alternates_memory(const char *reference)
 			     '\n', NULL, 0);
 }
 
+struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy)
+{
+	struct object_directory *new_odb;
+
+	/*
+	 * Make sure alternates are initialized, or else our entry may be
+	 * overwritten when they are.
+	 */
+	prepare_alt_odb(the_repository);
+
+	/*
+	 * Make a new primary odb and link the old primary ODB in as an
+	 * alternate
+	 */
+	new_odb = xcalloc(1, sizeof(*new_odb));
+	new_odb->path = xstrdup(dir);
+	new_odb->is_temp = 1;
+	new_odb->will_destroy = will_destroy;
+	new_odb->next = the_repository->objects->odb;
+	the_repository->objects->odb = new_odb;
+	return new_odb->next;
+}
+
+void restore_primary_odb(struct object_directory *restore_odb, const char *old_path)
+{
+	struct object_directory *cur_odb = the_repository->objects->odb;
+
+	if (strcmp(old_path, cur_odb->path))
+		BUG("expected %s as primary object store; found %s",
+		    old_path, cur_odb->path);
+
+	if (cur_odb->next != restore_odb)
+		BUG("we expect the old primary object store to be the first alternate");
+
+	the_repository->objects->odb = restore_odb;
+	free_object_directory(cur_odb);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
@@ -1893,8 +1931,11 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 /* Finalize a file on disk, and close it. */
 static void close_loose_object(int fd)
 {
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
+	if (!the_repository->objects->odb->will_destroy) {
+		if (fsync_object_files)
+			fsync_or_die(fd, "loose object file");
+	}
+
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
 }
diff --git a/object-store.h b/object-store.h
index 551639f173d..5bc9da6634e 100644
--- a/object-store.h
+++ b/object-store.h
@@ -31,7 +31,12 @@ struct object_directory {
 	 * This is a temporary object store, so there is no need to
 	 * create new objects via rename.
 	 */
-	int is_temp;
+	int is_temp : 8;
+
+	/*
+	 * This object store is ephemeral, so there is no need to fsync.
+	 */
+	int will_destroy : 8;
 
 	/*
 	 * Path to the alternative object store. If this is a relative path,
@@ -64,6 +69,17 @@ void add_to_alternates_file(const char *dir);
  */
 void add_to_alternates_memory(const char *dir);
 
+/*
+ * Replace the current writable object directory with the specified temporary
+ * object directory; returns the former primary object directory.
+ */
+struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy);
+
+/*
+ * Restore a previous ODB replaced by set_temporary_main_odb.
+ */
+void restore_primary_odb(struct object_directory *restore_odb, const char *old_path);
+
 /*
  * Populate and return the loose object cache array corresponding to the
  * given object ID.
@@ -74,6 +90,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
 /* Empty the loose object cache for the specified object directory. */
 void odb_clear_loose_cache(struct object_directory *odb);
 
+/* Clear and free the specified object directory */
+void free_object_directory(struct object_directory *odb);
+
 struct packed_git {
 	struct hashmap_entry packmap_ent;
 	struct packed_git *next;
diff --git a/object.c b/object.c
index 4e85955a941..98635bc4043 100644
--- a/object.c
+++ b/object.c
@@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
 	return o;
 }
 
-static void free_object_directory(struct object_directory *odb)
+void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index b8d880e3626..366ffe28511 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -11,6 +11,7 @@
 struct tmp_objdir {
 	struct strbuf path;
 	struct strvec env;
+	struct object_directory *prev_odb;
 };
 
 /*
@@ -38,6 +39,9 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	if (t == the_tmp_objdir)
 		the_tmp_objdir = NULL;
 
+	if (!on_signal && t->prev_odb)
+		restore_primary_odb(t->prev_odb, t->path.buf);
+
 	/*
 	 * This may use malloc via strbuf_grow(), but we should
 	 * have pre-grown t->path sufficiently so that this
@@ -52,6 +56,7 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	 */
 	if (!on_signal)
 		tmp_objdir_free(t);
+
 	return err;
 }
 
@@ -121,7 +126,7 @@ static int setup_tmp_objdir(const char *root)
 	return ret;
 }
 
-struct tmp_objdir *tmp_objdir_create(void)
+struct tmp_objdir *tmp_objdir_create(const char *prefix)
 {
 	static int installed_handlers;
 	struct tmp_objdir *t;
@@ -129,11 +134,16 @@ struct tmp_objdir *tmp_objdir_create(void)
 	if (the_tmp_objdir)
 		BUG("only one tmp_objdir can be used at a time");
 
-	t = xmalloc(sizeof(*t));
+	t = xcalloc(1, sizeof(*t));
 	strbuf_init(&t->path, 0);
 	strvec_init(&t->env);
 
-	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
+	/*
+	 * Use a string starting with tmp_ so that the builtin/prune.c code
+	 * can recognize any stale objdirs left behind by a crash and delete
+	 * them.
+	 */
+	strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX", get_object_directory(), prefix);
 
 	/*
 	 * Grow the strbuf beyond any filename we expect to be placed in it.
@@ -269,6 +279,15 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	if (!t)
 		return 0;
 
+
+
+	if (t->prev_odb) {
+		if (the_repository->objects->odb->will_destroy)
+			BUG("migrating and ODB that was marked for destruction");
+		restore_primary_odb(t->prev_odb, t->path.buf);
+		t->prev_odb = NULL;
+	}
+
 	strbuf_addbuf(&src, &t->path);
 	strbuf_addstr(&dst, get_object_directory());
 
@@ -292,3 +311,10 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
 {
 	add_to_alternates_memory(t->path.buf);
 }
+
+void tmp_objdir_replace_primary_odb(struct tmp_objdir *t, int will_destroy)
+{
+	if (t->prev_odb)
+		BUG("the primary object database is already replaced");
+	t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
index b1e45b4c75d..75754cbfba6 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -10,7 +10,7 @@
  *
  * Example:
  *
- *	struct tmp_objdir *t = tmp_objdir_create();
+ *	struct tmp_objdir *t = tmp_objdir_create("incoming");
  *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
  *	    !tmp_objdir_migrate(t))
  *		printf("success!\n");
@@ -22,9 +22,10 @@
 struct tmp_objdir;
 
 /*
- * Create a new temporary object directory; returns NULL on failure.
+ * Create a new temporary object directory with the specified prefix;
+ * returns NULL on failure.
  */
-struct tmp_objdir *tmp_objdir_create(void);
+struct tmp_objdir *tmp_objdir_create(const char *prefix);
 
 /*
  * Return a list of environment strings, suitable for use with
@@ -51,4 +52,11 @@ int tmp_objdir_destroy(struct tmp_objdir *);
  */
 void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
 
+/*
+ * Replaces the main object store in the current process with the temporary
+ * object directory and makes the former main object store an alternate.
+ * If will_destroy is nonzero, the object directory may not be migrated.
+ */
+void tmp_objdir_replace_primary_odb(struct tmp_objdir *, int will_destroy);
+
 #endif /* TMP_OBJDIR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                               ` (6 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 8785b2ac806..6ae18401e04 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_tmp_packfile(struct strbuf *basename,
 				const char *pack_tmp_name,
@@ -277,21 +277,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (2 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                               ` (5 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.

When the new mode is enabled do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1. Issue an fsync against a dummy file to flush the hardware writeback
   cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the case today,
   but may be a good extension to those components.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 29 ++++++++++++---
 Makefile                      |  6 +++
 bulk-checkin.c                | 70 +++++++++++++++++++++++++++++++++++
 bulk-checkin.h                |  2 +
 cache.h                       |  8 +++-
 config.c                      |  7 +++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 ++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 ++++
 object-file.c                 | 12 +++++-
 tmp-objdir.c                  |  2 -
 wrapper.c                     | 44 ++++++++++++++++++++++
 write-or-die.c                |  2 +-
 14 files changed, 187 insertions(+), 13 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..200b4d9f06e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,29 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls loose objects in the object store, so updates to any
+	refs or the index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each loose object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware. Git does not currently fsync parent directories for
+  newly-added files, so some filesystems may still allow data to be lost on
+  system crash.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write loose object data with a minimal set of FLUSH
+  CACHE (or equivalent) commands sent to the storage controller. If the
+  operating system interfaces are not available, this mode behaves the same as
+  `true`. This mode is expected to be as safe as `true` on macOS for repos
+  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
+  ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index a9f9b689f0c..313b3dc7cd6 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1874,6 +1876,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 6ae18401e04..e6c830f9c0f 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,14 +3,20 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
+#include "tmp-objdir.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
+static int needs_batch_fsync;
+
+static struct tmp_objdir *bulk_fsync_objdir;
 
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
@@ -79,6 +85,34 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+/*
+ * Cleanup after batch-mode fsync_object_files.
+ */
+static void do_batch_fsync(void)
+{
+	/*
+	 * Issue a full hardware flush against a temporary file to ensure
+	 * that all objects are durable before any renames occur.  The code in
+	 * fsync_loose_object_bulk_checkin has already issued a writeout
+	 * request, but it has not flushed any writeback cache in the storage
+	 * hardware.
+	 */
+
+	if (needs_batch_fsync) {
+		struct strbuf temp_path = STRBUF_INIT;
+		struct tempfile *temp;
+
+		strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
+		temp = xmks_tempfile(temp_path.buf);
+		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
+		delete_tempfile(&temp);
+		strbuf_release(&temp_path);
+	}
+
+	if (bulk_fsync_objdir)
+		tmp_objdir_migrate(bulk_fsync_objdir);
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -273,6 +307,26 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+void fsync_loose_object_bulk_checkin(int fd)
+{
+	assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
+
+	/*
+	 * If we have a plugged bulk checkin, we issue a call that
+	 * cleans the filesystem page cache but avoids a hardware flush
+	 * command. Later on we will issue a single hardware flush
+	 * before as part of do_batch_fsync.
+	 */
+	if (bulk_checkin_plugged &&
+	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+		assert(the_repository->objects->odb->is_temp);
+		if (!needs_batch_fsync)
+			needs_batch_fsync = 1;
+	} else {
+		fsync_or_die(fd, "loose object file");
+	}
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -287,6 +341,20 @@ int index_bulk_checkin(struct object_id *oid,
 void plug_bulk_checkin(void)
 {
 	assert(!bulk_checkin_plugged);
+
+	/*
+	 * Create a temporary object directory if the current
+	 * object directory is not already temporary.
+	 */
+	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
+	    !the_repository->objects->odb->is_temp) {
+		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
+		if (!bulk_fsync_objdir)
+			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
+
+		tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
+	}
+
 	bulk_checkin_plugged = 1;
 }
 
@@ -296,4 +364,6 @@ void unplug_bulk_checkin(void)
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_batch_fsync();
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..08f292379b6 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,6 +6,8 @@
 
 #include "cache.h"
 
+void fsync_loose_object_bulk_checkin(int fd);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
diff --git a/cache.h b/cache.h
index f6295f3b048..1ed8137b5e6 100644
--- a/cache.h
+++ b/cache.h
@@ -984,7 +984,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum fsync_object_files_mode {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum fsync_object_files_mode fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 2edf835262f..8315d020eeb 100644
--- a/config.c
+++ b/config.c
@@ -1506,7 +1506,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (value && !strcmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else if (git_config_bool(var, value))
+			fsync_object_files = FSYNC_OBJECT_FILES_ON;
+		else
+			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..e6d482fbcc6 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index 30fca67e6d6..371a73c1e30 100644
--- a/environment.c
+++ b/environment.c
@@ -42,7 +42,7 @@ const char *git_attributes_file;
 const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum fsync_object_files_mode fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index 7c99eef6612..9daee873782 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1213,6 +1213,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index 1a3ad558c45..8ea1348f0db 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1932,8 +1932,18 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 static void close_loose_object(int fd)
 {
 	if (!the_repository->objects->odb->will_destroy) {
-		if (fsync_object_files)
+		switch (fsync_object_files) {
+		case FSYNC_OBJECT_FILES_OFF:
+			break;
+		case FSYNC_OBJECT_FILES_ON:
 			fsync_or_die(fd, "loose object file");
+			break;
+		case FSYNC_OBJECT_FILES_BATCH:
+			fsync_loose_object_bulk_checkin(fd);
+			break;
+		default:
+			BUG("Invalid fsync_object_files mode.");
+		}
 	}
 
 	if (close(fd) != 0)
diff --git a/tmp-objdir.c b/tmp-objdir.c
index 366ffe28511..c26cb5eafee 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -279,8 +279,6 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	if (!t)
 		return 0;
 
-
-
 	if (t->prev_odb) {
 		if (the_repository->objects->odb->will_destroy)
 			BUG("migrating and ODB that was marked for destruction");
diff --git a/wrapper.c b/wrapper.c
index 7c6586af321..bb4f9f043ce 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	switch (action) {
+	case FSYNC_WRITEOUT_ONLY:
+
+#ifdef __APPLE__
+		/*
+		 * on macOS, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+
+	case FSYNC_HARDWARE_FLUSH:
+
+#ifdef __APPLE__
+		return fcntl(fd, F_FULLFSYNC);
+#else
+		return fsync(fd);
+#endif
+
+	default:
+		BUG("unexpected git_fsync(%d) call", action);
+	}
+
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index 0b1ec8190b6..cc8291d9794 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (3 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                               ` (4 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 28 ++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..6074a3d3ced 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..75324c24ee7
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,28 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index e6d482fbcc6..34c93314a50 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -451,6 +451,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..b573a5ee122 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index bb4f9f043ce..1a1e2fba9c9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (4 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
                               ` (3 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to synchronize with the update-index
process after passing it a file path.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..dc7368bb1ee 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1088,6 +1089,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 	the_index.updated_skipworktree = 1;
 
+	/* we might be adding many objects to the object database */
+	plug_bulk_checkin();
+
 	/*
 	 * Custom copy of parse_options() because we want to handle
 	 * filename arguments as they come.
@@ -1168,6 +1172,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	/* by now we must have added all of the new objects */
+	unplug_bulk_checkin();
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 7/9] unpack-objects: use the bulk-checkin infrastructure
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (5 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
                               ` (2 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/unpack-objects.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 4a9466295ba..51eb4f7b531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "object-store.h"
 #include "object.h"
@@ -503,10 +504,12 @@ static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
+	plug_bulk_checkin();
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
 	}
+	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (6 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-09-28 23:32             ` [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add test cases to exercise batch mode for:
 * 'git add'
 * 'git stash'
 * 'git update-index'
 * 'git unpack-objects'

These tests ensure that the added data winds up in the object database.

In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added due
to it already being in the repo.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/lib-unique-files.sh  | 36 ++++++++++++++++++++++++++++++++++++
 t/t3700-add.sh         | 20 ++++++++++++++++++++
 t/t3903-stash.sh       | 14 ++++++++++++++
 t/t5300-pack-object.sh | 30 +++++++++++++++++++-----------
 4 files changed, 89 insertions(+), 11 deletions(-)
 create mode 100644 t/lib-unique-files.sh

diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
new file mode 100644
index 00000000000..a7de4ca8512
--- /dev/null
+++ b/t/lib-unique-files.sh
@@ -0,0 +1,36 @@
+# Helper to create files with unique contents
+
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
+#					 each in my_dir, all with unique
+#					 contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+	local counter=0
+	test_tick
+	local basedata=$test_tick
+
+
+	rm -rf $basedir
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir"
+		for j in $(test_seq $files)
+		do
+			counter=$((counter + 1))
+			echo "$basedata.$counter"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4086e1ebbc9..36049a53ff7 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.'
 
 . ./test-lib.sh
 
+. $TEST_DIRECTORY/lib-unique-files.sh
+
 # Test the file mode "$1" of the file "$2" in the index.
 test_mode_in_index () {
 	case "$(git ls-files -s "$2")" in
@@ -33,6 +35,24 @@ test_expect_success \
     'Test that "git add -- -q" works' \
     'touch -- -q && git add -- -q'
 
+test_expect_success 'git add: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+	git ls-files --stage fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files2 &&
+	find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
+	rm -f fsynced_files2 &&
+	git ls-files --stage fsync-files2/ > fsynced_files2 &&
+	test_line_count = 8 fsynced_files2 &&
+	awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
+"
+
 test_expect_success \
 	'git add: Test that executable bit is not used if core.filemode=0' \
 	'git config core.filemode 0 &&
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 873aa56e359..2fc819e5584 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. $TEST_DIRECTORY/lib-unique-files.sh
 
 diff_cmp () {
 	for i in "$1" "$2"
@@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' '
 	git rev-parse --verify refs/stash:A.t
 '
 
+test_expect_success 'stash with core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+
+	# The files were untracked, so use the third parent,
+	# which contains the untracked files
+	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+
 test_expect_success 'stash -c stash.useBuiltin=false warning ' '
 	expected="stash.useBuiltin support has been removed" &&
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index e13a8842075..38663dc1393 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -162,23 +162,23 @@ test_expect_success 'pack-objects with bogus arguments' '
 
 check_unpack () {
 	test_when_finished "rm -rf git2" &&
-	git init --bare git2 &&
-	git -C git2 unpack-objects -n <"$1".pack &&
-	git -C git2 unpack-objects <"$1".pack &&
-	(cd .git && find objects -type f -print) |
-	while read path
-	do
-		cmp git2/$path .git/$path || {
-			echo $path differs.
-			return 1
-		}
-	done
+	git $2 init --bare git2 &&
+	(
+		git $2 -C git2 unpack-objects -n <"$1".pack &&
+		git $2 -C git2 unpack-objects <"$1".pack &&
+		git $2 -C git2 cat-file --batch-check="%(objectname)"
+	) <obj-list >current &&
+	cmp obj-list current
 }
 
 test_expect_success 'unpack without delta' '
 	check_unpack test-1-${packname_1}
 '
 
+test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
+	check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with REF_DELTA' '
 	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
 	check_deltas stderr -gt 0
@@ -188,6 +188,10 @@ test_expect_success 'unpack with REF_DELTA' '
 	check_unpack test-2-${packname_2}
 '
 
+test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with OFS_DELTA' '
 	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
 			<obj-list 2>stderr) &&
@@ -198,6 +202,10 @@ test_expect_success 'unpack with OFS_DELTA' '
 	check_unpack test-3-${packname_3}
 '
 
+test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'compare delta flavors' '
 	perl -e '\''
 		defined($_ = -s $_) or die for @ARGV;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (7 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-09-28 23:32             ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-09-28 23:32 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh, Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/p3700-add.sh   | 43 ++++++++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..e93c08a2e70
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..c9fcd0c03eb
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v7 1/9] object-file.c: do not rename in a temp odb
  2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
@ 2021-09-28 23:55               ` Jeff King
  2021-09-29  0:10                 ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Jeff King @ 2021-09-28 23:55 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Tue, Sep 28, 2021 at 11:32:43PM +0000, Neeraj Singh via GitGitGadget wrote:

> If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH
> being set, create object files with their final names. This avoids
> an extra rename beyond what is needed to merge the temporary ODB in
> tmp_objdir_migrate.

What's our goal here? Is it the performance of avoiding the extra
rename()? Or do we benefit from the simplicity of avoiding it?

If the former, do we have measurements on how much this matters?

If the latter, what does the simplicity buy us? I thought maybe it would
make reasoning about fsync() easier, because we don't have to worry
about fsyncing the rename. But we'd eventually have to rename() into the
real object directory anyway.

The reason I want to push back is...

> Creating an object file with the expected final name should be okay
> since the git process writing to the temporary object store is the
> only writer, and it only invokes write_loose_object/create_object_file
> after checking that the object doesn't exist.

...this seems like a kind-of dangerous assumption. Most of the time,
yeah, I'd expect just a single process to be writing. But one of the
things that happens during the receive-pack quarantine is that we run
hooks, which can run any set of arbitrary Git commands, including
simultaneous readers and writers. It seems like we might be introducing
subtle races there.

-Peff

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v7 1/9] object-file.c: do not rename in a temp odb
  2021-09-28 23:55               ` Jeff King
@ 2021-09-29  0:10                 ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-29  0:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

On Tue, Sep 28, 2021 at 4:55 PM Jeff King <peff@peff.net> wrote:
>
> On Tue, Sep 28, 2021 at 11:32:43PM +0000, Neeraj Singh via GitGitGadget wrote:
>
> > If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH
> > being set, create object files with their final names. This avoids
> > an extra rename beyond what is needed to merge the temporary ODB in
> > tmp_objdir_migrate.
>
> What's our goal here? Is it the performance of avoiding the extra
> rename()? Or do we benefit from the simplicity of avoiding it?
>
> If the former, do we have measurements on how much this matters?
>
> If the latter, what does the simplicity buy us? I thought maybe it would
> make reasoning about fsync() easier, because we don't have to worry
> about fsyncing the rename. But we'd eventually have to rename() into the
> real object directory anyway.
>
> The reason I want to push back is...
>
> > Creating an object file with the expected final name should be okay
> > since the git process writing to the temporary object store is the
> > only writer, and it only invokes write_loose_object/create_object_file
> > after checking that the object doesn't exist.
>
> ...this seems like a kind-of dangerous assumption. Most of the time,
> yeah, I'd expect just a single process to be writing. But one of the
> things that happens during the receive-pack quarantine is that we run
> hooks, which can run any set of arbitrary Git commands, including
> simultaneous readers and writers. It seems like we might be introducing
> subtle races there.
>
> -Peff

Yes, the main goal was to avoid an extra rename. I see your concern
and I guess we have no way of knowing if someone is really going to
get bitten by this or not. On the other hand, we do know in the case
of batch_fsync that only Git is running against the objdir at that
time.  I'll remove this change since it's not where the real perf
benefit is.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases
  2021-09-28 23:32             ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
@ 2021-09-29  8:41               ` Elijah Newren
  2021-09-29 16:40                 ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Elijah Newren @ 2021-09-29  8:41 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: Git Mailing List, Neeraj-Personal, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

Hi,

Thanks for working on this, and for moving this up in your series near
the beginning.

On Tue, Sep 28, 2021 at 4:34 PM Neeraj Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Neeraj Singh <neerajsi@microsoft.com>
>
> This patch is based on work by Elijah Newren. Any bugs however are my
> own.

This kind of information is often included in a commit message via a
trailer such as:
    Based-on-patch-by: Elijah Newren <newren@gmail.com>
or Helped-by: or Co-authored-by: or Contributions-by: .

> The tmp_objdir API provides the ability to create temporary object
> directories, but was designed with the goal of having subprocesses
> access these object stores, followed by the main process migrating
> objects from it to the main object store or just deleting it.  The
> subprocesses would view it as their primary datastore and write to it.
>
> Here we add the tmp_objdir_replace_primary_odb function that replaces
> the current process's writable "main" object directory with the
> specified one. The previous main object directory is restored in either
> tmp_objdir_migrate or tmp_objdir_destroy.
>
> For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
> object_database` to mark ephemeral object databases that do not require
> fsync durability.
>
> Add 'git prune' support for removing temporary object databases, and
> make sure that they have a name starting with tmp_ and containing an
> operation-specific name.
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> ---
>  builtin/prune.c        | 22 +++++++++++++++++----
>  builtin/receive-pack.c |  2 +-
>  object-file.c          | 45 ++++++++++++++++++++++++++++++++++++++++--
>  object-store.h         | 21 +++++++++++++++++++-
>  object.c               |  2 +-
>  tmp-objdir.c           | 32 +++++++++++++++++++++++++++---
>  tmp-objdir.h           | 14 ++++++++++---
>  7 files changed, 123 insertions(+), 15 deletions(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 02c6ab7cbaa..9c72ecf5a58 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -18,6 +18,7 @@ static int show_only;
>  static int verbose;
>  static timestamp_t expire;
>  static int show_progress = -1;
> +static struct strbuf remove_dir_buf = STRBUF_INIT;
>
>  static int prune_tmp_file(const char *fullpath)
>  {
> @@ -26,10 +27,19 @@ static int prune_tmp_file(const char *fullpath)
>                 return error("Could not stat '%s'", fullpath);
>         if (st.st_mtime > expire)
>                 return 0;
> -       if (show_only || verbose)
> -               printf("Removing stale temporary file %s\n", fullpath);
> -       if (!show_only)
> -               unlink_or_warn(fullpath);
> +       if (S_ISDIR(st.st_mode)) {
> +               if (show_only || verbose)
> +                       printf("Removing stale temporary directory %s\n", fullpath);
> +               if (!show_only) {
> +                       strbuf_addstr(&remove_dir_buf, fullpath);
> +                       remove_dir_recursively(&remove_dir_buf, 0);
> +               }
> +       } else {
> +               if (show_only || verbose)
> +                       printf("Removing stale temporary file %s\n", fullpath);
> +               if (!show_only)
> +                       unlink_or_warn(fullpath);
> +       }
>         return 0;
>  }
>
> @@ -97,6 +107,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
>
>  static int prune_subdir(unsigned int nr, const char *path, void *data)
>  {
> +       if (verbose)
> +               printf("Removing directory %s\n", path);
> +
>         if (!show_only)
>                 rmdir(path);
>         return 0;
> @@ -185,5 +198,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>                 prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
>         }
>
> +       strbuf_release(&remove_dir_buf);
>         return 0;
>  }
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 48960a9575b..418a42ca069 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -2208,7 +2208,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>                 strvec_push(&child.args, alt_shallow_file);
>         }
>
> -       tmp_objdir = tmp_objdir_create();
> +       tmp_objdir = tmp_objdir_create("incoming");
>         if (!tmp_objdir) {
>                 if (err_fd > 0)
>                         close(err_fd);
> diff --git a/object-file.c b/object-file.c
> index 49c53f801f7..1a3ad558c45 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -751,6 +751,44 @@ void add_to_alternates_memory(const char *reference)
>                              '\n', NULL, 0);
>  }
>
> +struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy)
> +{
> +       struct object_directory *new_odb;
> +
> +       /*
> +        * Make sure alternates are initialized, or else our entry may be
> +        * overwritten when they are.
> +        */
> +       prepare_alt_odb(the_repository);

This implicit dependence on the_repository is unfortunate.  My
versions passed the repository parameter explicitly.  While my
remerge-diff code doesn't really make use of that currently, it could
make sense to have temporary object stores for a submodule and do
remerge-diff work on them.  You've also got two more uses of
the_repository later in this function.

> +
> +       /*
> +        * Make a new primary odb and link the old primary ODB in as an
> +        * alternate
> +        */
> +       new_odb = xcalloc(1, sizeof(*new_odb));
> +       new_odb->path = xstrdup(dir);
> +       new_odb->is_temp = 1;
> +       new_odb->will_destroy = will_destroy;
> +       new_odb->next = the_repository->objects->odb;
> +       the_repository->objects->odb = new_odb;
> +       return new_odb->next;
> +}
> +
> +void restore_primary_odb(struct object_directory *restore_odb, const char *old_path)
> +{
> +       struct object_directory *cur_odb = the_repository->objects->odb;

Another use of the_repository, and some more below.

> +
> +       if (strcmp(old_path, cur_odb->path))
> +               BUG("expected %s as primary object store; found %s",
> +                   old_path, cur_odb->path);
> +
> +       if (cur_odb->next != restore_odb)
> +               BUG("we expect the old primary object store to be the first alternate");
> +
> +       the_repository->objects->odb = restore_odb;
> +       free_object_directory(cur_odb);
> +}
> +
>  /*
>   * Compute the exact path an alternate is at and returns it. In case of
>   * error NULL is returned and the human readable error is added to `err`
> @@ -1893,8 +1931,11 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>  /* Finalize a file on disk, and close it. */
>  static void close_loose_object(int fd)
>  {
> -       if (fsync_object_files)
> -               fsync_or_die(fd, "loose object file");
> +       if (!the_repository->objects->odb->will_destroy) {
> +               if (fsync_object_files)
> +                       fsync_or_die(fd, "loose object file");
> +       }
> +
>         if (close(fd) != 0)
>                 die_errno(_("error when closing loose object file"));
>  }
> diff --git a/object-store.h b/object-store.h
> index 551639f173d..5bc9da6634e 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -31,7 +31,12 @@ struct object_directory {
>          * This is a temporary object store, so there is no need to
>          * create new objects via rename.
>          */
> -       int is_temp;
> +       int is_temp : 8;
> +
> +       /*
> +        * This object store is ephemeral, so there is no need to fsync.
> +        */
> +       int will_destroy : 8;

Why 8 bits wide rather than 1?  I thought these were boolean
values...was I mistaken?

(Also, if boolean and compressing to 1 bit, should probably be
unsigned rather than signed.)

>         /*
>          * Path to the alternative object store. If this is a relative path,
> @@ -64,6 +69,17 @@ void add_to_alternates_file(const char *dir);
>   */
>  void add_to_alternates_memory(const char *dir);
>
> +/*
> + * Replace the current writable object directory with the specified temporary
> + * object directory; returns the former primary object directory.
> + */
> +struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy);
> +
> +/*
> + * Restore a previous ODB replaced by set_temporary_main_odb.
> + */
> +void restore_primary_odb(struct object_directory *restore_odb, const char *old_path);
> +
>  /*
>   * Populate and return the loose object cache array corresponding to the
>   * given object ID.
> @@ -74,6 +90,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
>  /* Empty the loose object cache for the specified object directory. */
>  void odb_clear_loose_cache(struct object_directory *odb);
>
> +/* Clear and free the specified object directory */
> +void free_object_directory(struct object_directory *odb);
> +
>  struct packed_git {
>         struct hashmap_entry packmap_ent;
>         struct packed_git *next;
> diff --git a/object.c b/object.c
> index 4e85955a941..98635bc4043 100644
> --- a/object.c
> +++ b/object.c
> @@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
>         return o;
>  }
>
> -static void free_object_directory(struct object_directory *odb)
> +void free_object_directory(struct object_directory *odb)
>  {
>         free(odb->path);
>         odb_clear_loose_cache(odb);
> diff --git a/tmp-objdir.c b/tmp-objdir.c
> index b8d880e3626..366ffe28511 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -11,6 +11,7 @@
>  struct tmp_objdir {
>         struct strbuf path;
>         struct strvec env;
> +       struct object_directory *prev_odb;
>  };
>
>  /*
> @@ -38,6 +39,9 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
>         if (t == the_tmp_objdir)
>                 the_tmp_objdir = NULL;
>
> +       if (!on_signal && t->prev_odb)
> +               restore_primary_odb(t->prev_odb, t->path.buf);
> +
>         /*
>          * This may use malloc via strbuf_grow(), but we should
>          * have pre-grown t->path sufficiently so that this
> @@ -52,6 +56,7 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
>          */
>         if (!on_signal)
>                 tmp_objdir_free(t);
> +
>         return err;
>  }
>
> @@ -121,7 +126,7 @@ static int setup_tmp_objdir(const char *root)
>         return ret;
>  }
>
> -struct tmp_objdir *tmp_objdir_create(void)
> +struct tmp_objdir *tmp_objdir_create(const char *prefix)
>  {
>         static int installed_handlers;
>         struct tmp_objdir *t;
> @@ -129,11 +134,16 @@ struct tmp_objdir *tmp_objdir_create(void)
>         if (the_tmp_objdir)
>                 BUG("only one tmp_objdir can be used at a time");
>
> -       t = xmalloc(sizeof(*t));
> +       t = xcalloc(1, sizeof(*t));
>         strbuf_init(&t->path, 0);
>         strvec_init(&t->env);
>
> -       strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
> +       /*
> +        * Use a string starting with tmp_ so that the builtin/prune.c code
> +        * can recognize any stale objdirs left behind by a crash and delete
> +        * them.
> +        */
> +       strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX", get_object_directory(), prefix);
>
>         /*
>          * Grow the strbuf beyond any filename we expect to be placed in it.
> @@ -269,6 +279,15 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
>         if (!t)
>                 return 0;
>
> +
> +

Why so many blank lines?

> +       if (t->prev_odb) {
> +               if (the_repository->objects->odb->will_destroy)

Another implicit dependence on the_repository.

> +                       BUG("migrating and ODB that was marked for destruction");
> +               restore_primary_odb(t->prev_odb, t->path.buf);
> +               t->prev_odb = NULL;
> +       }
> +
>         strbuf_addbuf(&src, &t->path);
>         strbuf_addstr(&dst, get_object_directory());
>
> @@ -292,3 +311,10 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
>  {
>         add_to_alternates_memory(t->path.buf);
>  }
> +
> +void tmp_objdir_replace_primary_odb(struct tmp_objdir *t, int will_destroy)
> +{
> +       if (t->prev_odb)
> +               BUG("the primary object database is already replaced");
> +       t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);
> +}
> diff --git a/tmp-objdir.h b/tmp-objdir.h
> index b1e45b4c75d..75754cbfba6 100644
> --- a/tmp-objdir.h
> +++ b/tmp-objdir.h
> @@ -10,7 +10,7 @@
>   *
>   * Example:
>   *
> - *     struct tmp_objdir *t = tmp_objdir_create();
> + *     struct tmp_objdir *t = tmp_objdir_create("incoming");
>   *     if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
>   *         !tmp_objdir_migrate(t))
>   *             printf("success!\n");
> @@ -22,9 +22,10 @@
>  struct tmp_objdir;
>
>  /*
> - * Create a new temporary object directory; returns NULL on failure.
> + * Create a new temporary object directory with the specified prefix;
> + * returns NULL on failure.
>   */
> -struct tmp_objdir *tmp_objdir_create(void);
> +struct tmp_objdir *tmp_objdir_create(const char *prefix);
>
>  /*
>   * Return a list of environment strings, suitable for use with
> @@ -51,4 +52,11 @@ int tmp_objdir_destroy(struct tmp_objdir *);
>   */
>  void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
>
> +/*
> + * Replaces the main object store in the current process with the temporary
> + * object directory and makes the former main object store an alternate.
> + * If will_destroy is nonzero, the object directory may not be migrated.
> + */
> +void tmp_objdir_replace_primary_odb(struct tmp_objdir *, int will_destroy);
> +
>  #endif /* TMP_OBJDIR_H */
> --
> gitgitgadget

Other than those minor things, I couldn't find any problems.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases
  2021-09-29  8:41               ` Elijah Newren
@ 2021-09-29 16:40                 ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-09-29 16:40 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Neeraj Singh via GitGitGadget, Git Mailing List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Wed, Sep 29, 2021 at 1:42 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> Thanks for working on this, and for moving this up in your series near
> the beginning.
>
> On Tue, Sep 28, 2021 at 4:34 PM Neeraj Singh via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > This patch is based on work by Elijah Newren. Any bugs however are my
> > own.
>
> This kind of information is often included in a commit message via a
> trailer such as:
>     Based-on-patch-by: Elijah Newren <newren@gmail.com>
> or Helped-by: or Co-authored-by: or Contributions-by: .

Will fix. I didn't know what some acceptable trailers were.  I'll use:
Based-on-patch-by: Elijah Newren <newren@gmail.com>

> >
> > +struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy)
> > +{
> > +       struct object_directory *new_odb;
> > +
> > +       /*
> > +        * Make sure alternates are initialized, or else our entry may be
> > +        * overwritten when they are.
> > +        */
> > +       prepare_alt_odb(the_repository);
>
> This implicit dependence on the_repository is unfortunate.  My
> versions passed the repository parameter explicitly.  While my
> remerge-diff code doesn't really make use of that currently, it could
> make sense to have temporary object stores for a submodule and do
> remerge-diff work on them.  You've also got two more uses of
> the_repository later in this function.

The core loose object code in object-file.c is riven with
the_repository assumptions. I'd have to refactor that code (including
the alternates code) to take repository arguments.  Given the
extensive assumptions, I'd like to push back on this suggestion and
all of the related suggestions.

> > diff --git a/object-store.h b/object-store.h
> > index 551639f173d..5bc9da6634e 100644
> > --- a/object-store.h
> > +++ b/object-store.h
> > @@ -31,7 +31,12 @@ struct object_directory {
> >          * This is a temporary object store, so there is no need to
> >          * create new objects via rename.
> >          */
> > -       int is_temp;
> > +       int is_temp : 8;
> > +
> > +       /*
> > +        * This object store is ephemeral, so there is no need to fsync.
> > +        */
> > +       int will_destroy : 8;
>
> Why 8 bits wide rather than 1?  I thought these were boolean
> values...was I mistaken?
>
> (Also, if boolean and compressing to 1 bit, should probably be
> unsigned rather than signed.)

This will go away when I drop the rename patch.  I wish we had a
standard bool_t type which is one char wide.  This is a
microoptimization, since accessing bits usually encodes to more or
larger instructions than accessing bytes.

> > +        */
> > +       strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX", get_object_directory(), prefix);
> >
> >         /*
> >          * Grow the strbuf beyond any filename we expect to be placed in it.
> > @@ -269,6 +279,15 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
> >         if (!t)
> >                 return 0;
> >
> > +
> > +
>
> Why so many blank lines?
>

This was an accident, will remove.

> Other than those minor things, I couldn't find any problems.

Thanks for the review!

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                               ` (8 preceding siblings ...)
  2021-09-28 23:32             ` [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57             ` Neeraj K. Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
                                 ` (9 more replies)
  9 siblings, 10 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh

Thanks to everyone for review so far!

This series shares the base tmp-objdir patches with my merged version of
Elijah Newren's remerge-diff series at:
https://github.com/neerajsi-msft/git/tree/neerajsi/remerge-diff.

The patch is now at version 8: changes since v7:

 * Dropped the tmp-objdir patch to avoid renaming in a quarantine/temporary
   objdir, as suggested by Jeff King. This wasn't a good idea because we
   don't really know that there's only a single reader/writer. Avoiding the
   rename was a relatively minor perf optimization so it's okay to drop.

 * Added disable_ref_updates logic (as a flag on the odb) which is set when
   we're in a quarantine or when a tmp objdir is active. I believe this
   roughly follows the strategy suggested by Jeff King.

The patch is now at version 7: changes since v6:

 * Rebased onto current upstream master

 * Separate the tmp-objdir changes and move to the beginning of the series
   so that Elijah Newren's similar changes can be merged.

 * Use some of Elijah's implementation for replacing the primary ODB. I was
   doing some unnecessarily complex copying for no good reason.

 * Make the tmp objdir code use a name beginning with tmp_ and having a
   operation-specific prefix.

 * Add git-prune support for removing a stale object directory.

v5 was a bit of a dud, with some issues that I only noticed after
submitting. v6 changes:

 * re-add Windows support
 * fix minor formatting issues
 * reset git author and commit dates which got messed up

Changes since v4, all in response to review feedback from Ævar Arnfjörð
Bjarmason:

 * Update core.fsyncobjectfiles documentation to specify 'loose' objects and
   to add a statement about not fsyncing parent directories.
   
   * I still don't want to make any promises on behalf of the Linux FS developers
     in the documentation. However, according to [v4.1] and my understanding
     of how XFS journals are documented to work, it looks like recent versions
     of Linux running on XFS should be as safe as Windows or macOS in 'batch'
     mode. I don't know about ext4, since it's not clear to me when metadata
     updates are made visible to the journal.
   

 * Rewrite the core batched fsync change to use the tmp-objdir lib. As Ævar
   pointed out, this lets us access the added loose objects immediately,
   rather than only after unplugging the bulk checkin. This is a hard
   requirement in unpack-objects for resolving OBJ_REF_DELTA packed objects.
   
   * As a preparatory patch, the object-file code now doesn't do a rename if it's in a
     tmp objdir (as determined by the quarantine environment variable).
   
   * I added support to the tmp-objdir lib to replace the 'main' writable odb.
   
   * Instead of using a lockfile for the final full fsync, we now use a new dummy
     temp file. Doing that makes the below unpack-objects change easier.
   

 * Add bulk-checkin support to unpack-objects, which is used in fetch and
   push. In addition to making those operations faster, it allows us to
   directly compare performance of packfiles against loose objects. Please
   see [v4.2] for a measurement of 'git push' to a local upstream with
   different numbers of unique new files.

 * Rename FSYNC_OBJECT_FILES_MODE to fsync_object_files_mode.

 * Remove comment with link to NtFlushBuffersFileEx documentation.

 * Make t/lib-unique-files.sh a bit cleaner. We are still creating unique
   contents, but now this uses test_tick, so it should be deterministic from
   run to run.

 * Ensure there are tests for all of the modified commands. Make the
   unpack-objects tests validate that the unpacked objects are really
   available in the ODB.

References for v4: [v4.1]
https://lore.kernel.org/linux-fsdevel/20190419072938.31320-1-amir73il@gmail.com/#t

[v4.2]
https://docs.google.com/spreadsheets/d/1uxMBkEXFFnQ1Y3lXKqcKpw6Mq44BzhpCAcPex14T-QQ/edit#gid=1898936117

Changes since v3:

 * Fix core.fsyncobjectfiles option parsing as suggested by Junio: We now
   accept no value to mean "true" and we require 'batch' to be lowercase.

 * Leave the default fsync mode as 'false'. Git for windows can change its
   default when this series makes it over to that fork.

 * Use a switch statement in git_fsync, as suggested by Junio.

 * Add regression test cases for core.fsyncobjectfiles=batch. This should
   keep the batch functionality basically working in upstream git even if
   few users adopt batch mode initially. I expect git-for-windows will
   provide a good baking area for the new mode.

Neeraj Singh (9):
  tmp-objdir: new API for creating temporary writable databases
  tmp-objdir: disable ref updates when replacing the primary odb
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       | 29 ++++++++--
 Makefile                            |  6 ++
 builtin/prune.c                     | 22 +++++--
 builtin/receive-pack.c              |  2 +-
 builtin/unpack-objects.c            |  3 +
 builtin/update-index.c              |  6 ++
 bulk-checkin.c                      | 90 +++++++++++++++++++++++++----
 bulk-checkin.h                      |  2 +
 cache.h                             |  8 ++-
 compat/mingw.h                      |  3 +
 compat/win32/flush.c                | 28 +++++++++
 config.c                            |  7 ++-
 config.mak.uname                    |  3 +
 configure.ac                        |  8 +++
 contrib/buildsystems/CMakeLists.txt |  3 +-
 environment.c                       |  6 +-
 git-compat-util.h                   |  7 +++
 object-file.c                       | 60 ++++++++++++++++++-
 object-store.h                      | 26 +++++++++
 object.c                            |  2 +-
 refs.c                              |  2 +-
 repository.c                        |  2 +
 repository.h                        |  1 +
 t/lib-unique-files.sh               | 36 ++++++++++++
 t/perf/p3700-add.sh                 | 43 ++++++++++++++
 t/perf/p3900-stash.sh               | 46 +++++++++++++++
 t/t3700-add.sh                      | 20 +++++++
 t/t3903-stash.sh                    | 14 +++++
 t/t5300-pack-object.sh              | 30 ++++++----
 tmp-objdir.c                        | 30 +++++++++-
 tmp-objdir.h                        | 14 ++++-
 wrapper.c                           | 48 +++++++++++++++
 write-or-die.c                      |  2 +-
 33 files changed, 562 insertions(+), 47 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: cefe983a320c03d7843ac78e73bd513a27806845
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v8
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v8
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v7:

  2:  6ce72a709a1 !  1:  f03797fd80d tmp-objdir: new API for creating temporary writable databases
     @@ Metadata
       ## Commit message ##
          tmp-objdir: new API for creating temporary writable databases
      
     -    This patch is based on work by Elijah Newren. Any bugs however are my
     -    own.
     -
          The tmp_objdir API provides the ability to create temporary object
          directories, but was designed with the goal of having subprocesses
          access these object stores, followed by the main process migrating
     @@ Commit message
          make sure that they have a name starting with tmp_ and containing an
          operation-specific name.
      
     +    Based-on-patch-by: Elijah Newren <newren@gmail.com>
     +
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
       ## builtin/prune.c ##
     @@ object-file.c: void add_to_alternates_memory(const char *reference)
      +	 */
      +	new_odb = xcalloc(1, sizeof(*new_odb));
      +	new_odb->path = xstrdup(dir);
     -+	new_odb->is_temp = 1;
      +	new_odb->will_destroy = will_destroy;
      +	new_odb->next = the_repository->objects->odb;
      +	the_repository->objects->odb = new_odb;
     @@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void
      
       ## object-store.h ##
      @@ object-store.h: struct object_directory {
     - 	 * This is a temporary object store, so there is no need to
     - 	 * create new objects via rename.
     - 	 */
     --	int is_temp;
     -+	int is_temp : 8;
     -+
     + 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
     + 	struct oidtree *loose_objects_cache;
     + 
      +	/*
      +	 * This object store is ephemeral, so there is no need to fsync.
      +	 */
     -+	int will_destroy : 8;
     - 
     ++	int will_destroy;
     ++
       	/*
       	 * Path to the alternative object store. If this is a relative path,
     + 	 * it is relative to the current working directory.
      @@ object-store.h: void add_to_alternates_file(const char *dir);
        */
       void add_to_alternates_memory(const char *dir);
     @@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
       	if (!t)
       		return 0;
       
     -+
     -+
      +	if (t->prev_odb) {
      +		if (the_repository->objects->odb->will_destroy)
     -+			BUG("migrating and ODB that was marked for destruction");
     ++			BUG("migrating an ODB that was marked for destruction");
      +		restore_primary_odb(t->prev_odb, t->path.buf);
      +		t->prev_odb = NULL;
      +	}
  1:  6e65f68fd6d !  2:  bc085137340 object-file.c: do not rename in a temp odb
     @@ Metadata
      Author: Neeraj Singh <neerajsi@microsoft.com>
      
       ## Commit message ##
     -    object-file.c: do not rename in a temp odb
     +    tmp-objdir: disable ref updates when replacing the primary odb
      
     -    If a temporary ODB is active, as determined by GIT_QUARANTINE_PATH
     -    being set, create object files with their final names. This avoids
     -    an extra rename beyond what is needed to merge the temporary ODB in
     -    tmp_objdir_migrate.
     +    When creating a subprocess with a temporary ODB, we set the
     +    GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
     +    to update refs, since the tmp-objdir may go away.
      
     -    Creating an object file with the expected final name should be okay
     -    since the git process writing to the temporary object store is the
     -    only writer, and it only invokes write_loose_object/create_object_file
     -    after checking that the object doesn't exist.
     +    Introduce a similar mechanism for in-process temporary ODBs when
     +    we call tmp_objdir_replace_primary_odb. Now both mechanisms set
     +    the disable_ref_updates flag on the odb, which is queried by
     +    the ref_transaction_prepare function.
     +
     +    Note: This change adds an assumption that the state of
     +    the_repository is relevant for any ref transaction that might
     +    be initiated. Unwinding this assumption should be straightforward
     +    by saving the relevant repository to query in the transaction or
     +    the ref_store.
     +
     +    Peff's test case was invoking ref updates via the cachetextconv
     +    setting. That particular code silently does nothing when a ref
     +    update is forbidden. See the call to notes_cache_put in
     +    fill_textconv where errors are ignored.
     +
     +    Reported-by: Jeff King <peff@peff.net>
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
      
     @@ environment.c: void setup_git_env(const char *git_dir)
       	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
       	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
      +	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
     -+		args.object_dir_is_temp = 1;
     ++		args.disable_ref_updates = 1;
      +	}
      +
       	repo_set_gitdir(the_repository, git_dir, &args);
     @@ environment.c: void setup_git_env(const char *git_dir)
       
      
       ## object-file.c ##
     -@@ object-file.c: static void write_object_file_prepare(const struct git_hash_algo *algo,
     - }
     - 
     - /*
     -- * Move the just written object into its final resting place.
     -+ * Move the just written object into its final resting place,
     -+ * unless it is already there, as indicated by an empty string for
     -+ * tmpfile.
     -  */
     - int finalize_object_file(const char *tmpfile, const char *filename)
     - {
     - 	int ret = 0;
     - 
     -+	if (!*tmpfile)
     -+		goto out;
     +@@ object-file.c: struct object_directory *set_temporary_primary_odb(const char *dir, int will_des
     + 	 */
     + 	new_odb = xcalloc(1, sizeof(*new_odb));
     + 	new_odb->path = xstrdup(dir);
      +
     - 	if (object_creation_mode == OBJECT_CREATION_USES_RENAMES)
     - 		goto try_rename;
     - 	else if (link(tmpfile, filename))
     -@@ object-file.c: static inline int directory_size(const char *filename)
     - }
     - 
     - /*
     -- * This creates a temporary file in the same directory as the final
     -- * 'filename'
     -+ * This creates a loose object file for the specified object id.
     -+ * If we're working in a temporary object directory, the file is
     -+ * created with its final filename, otherwise it is created with
     -+ * a temporary name and renamed by finalize_object_file.
     -+ * If no rename is required, an empty string is returned in tmp.
     -  *
     -  * We want to avoid cross-directory filename renames, because those
     -  * can have problems on various filesystems (FAT, NFS, Coda).
     -  */
     --static int create_tmpfile(struct strbuf *tmp, const char *filename)
     -+static int create_objfile(const struct object_id *oid, struct strbuf *tmp,
     -+			  struct strbuf *filename)
     - {
     --	int fd, dirlen = directory_size(filename);
     -+	int fd, dirlen, is_retrying = 0;
     -+	const char *object_name;
     -+	static const int object_mode = 0444;
     - 
     -+	loose_object_path(the_repository, filename, oid);
     -+	dirlen = directory_size(filename->buf);
     -+
     -+retry_create:
     - 	strbuf_reset(tmp);
     --	strbuf_add(tmp, filename, dirlen);
     --	strbuf_addstr(tmp, "tmp_obj_XXXXXX");
     --	fd = git_mkstemp_mode(tmp->buf, 0444);
     --	if (fd < 0 && dirlen && errno == ENOENT) {
     -+	if (!the_repository->objects->odb->is_temp) {
     -+		strbuf_add(tmp, filename->buf, dirlen);
     -+		object_name = "tmp_obj_XXXXXX";
     -+		strbuf_addstr(tmp, object_name);
     -+		fd = git_mkstemp_mode(tmp->buf, object_mode);
     -+	} else {
     -+		fd = open(filename->buf, O_CREAT | O_EXCL | O_RDWR, object_mode);
     -+	}
     -+
     -+	if (fd < 0 && dirlen && errno == ENOENT && !is_retrying) {
     - 		/*
     - 		 * Make sure the directory exists; note that the contents
     - 		 * of the buffer are undefined after mkstemp returns an
     -@@ object-file.c: static int create_tmpfile(struct strbuf *tmp, const char *filename)
     - 		 * scratch.
     - 		 */
     - 		strbuf_reset(tmp);
     --		strbuf_add(tmp, filename, dirlen - 1);
     -+		strbuf_add(tmp, filename->buf, dirlen - 1);
     - 		if (mkdir(tmp->buf, 0777) && errno != EEXIST)
     - 			return -1;
     - 		if (adjust_shared_perm(tmp->buf))
     - 			return -1;
     - 
     - 		/* Try again */
     --		strbuf_addstr(tmp, "/tmp_obj_XXXXXX");
     --		fd = git_mkstemp_mode(tmp->buf, 0444);
     -+		is_retrying = 1;
     -+		goto retry_create;
     - 	}
     - 	return fd;
     - }
     -@@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
     - 	static struct strbuf tmp_file = STRBUF_INIT;
     - 	static struct strbuf filename = STRBUF_INIT;
     - 
     --	loose_object_path(the_repository, &filename, oid);
     --
     --	fd = create_tmpfile(&tmp_file, filename.buf);
     -+	fd = create_objfile(oid, &tmp_file, &filename);
     - 	if (fd < 0) {
     - 		if (errno == EACCES)
     - 			return error(_("insufficient permission for adding an object to repository database %s"), get_object_directory());
     - 		else
     --			return error_errno(_("unable to create temporary file"));
     -+			return error_errno(_("unable to create object file"));
     - 	}
     - 
     - 	/* Set it up */
     ++	/*
     ++	 * Disable ref updates while a temporary odb is active, since
     ++	 * the objects in the database may roll back.
     ++	 */
     ++	new_odb->disable_ref_updates = 1;
     + 	new_odb->will_destroy = will_destroy;
     + 	new_odb->next = the_repository->objects->odb;
     + 	the_repository->objects->odb = new_odb;
      
       ## object-store.h ##
      @@ object-store.h: struct object_directory {
     @@ object-store.h: struct object_directory {
       	struct oidtree *loose_objects_cache;
       
      +	/*
     -+	 * This is a temporary object store, so there is no need to
     -+	 * create new objects via rename.
     ++	 * This is a temporary object store created by the tmp_objdir
     ++	 * facility. Disable ref updates since the objects in the store
     ++	 * might be discarded on rollback.
      +	 */
     -+	int is_temp;
     ++	unsigned int disable_ref_updates : 1;
      +
     + 	/*
     + 	 * This object store is ephemeral, so there is no need to fsync.
     + 	 */
     +-	int will_destroy;
     ++	unsigned int will_destroy : 1;
     + 
       	/*
       	 * Path to the alternative object store. If this is a relative path,
     - 	 * it is relative to the current working directory.
     +
     + ## refs.c ##
     +@@ refs.c: int ref_transaction_prepare(struct ref_transaction *transaction,
     + 		break;
     + 	}
     + 
     +-	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
     ++	if (the_repository->objects->odb->disable_ref_updates) {
     + 		strbuf_addstr(err,
     + 			      _("ref updates forbidden inside quarantine environment"));
     + 		return -1;
      
       ## repository.c ##
      @@ repository.c: void repo_set_gitdir(struct repository *repo,
       	expand_base_dir(&repo->objects->odb->path, o->object_dir,
       			repo->commondir, "objects");
       
     -+	repo->objects->odb->is_temp = o->object_dir_is_temp;
     ++	repo->objects->odb->disable_ref_updates = o->disable_ref_updates;
      +
       	free(repo->objects->alternate_db);
       	repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
     @@ repository.h: struct set_gitdir_args {
       	const char *graft_file;
       	const char *index_file;
       	const char *alternate_db;
     -+	int object_dir_is_temp;
     ++	int disable_ref_updates;
       };
       
       void repo_set_gitdir(struct repository *repo, const char *root,
  3:  c272f8776fa =  3:  9335646ed91 bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  4:  55556bb3e90 !  4:  b9d3d874432 core.fsyncobjectfiles: batched disk flushes
     @@ bulk-checkin.c: static int deflate_to_pack(struct bulk_checkin_state *state,
      +	 */
      +	if (bulk_checkin_plugged &&
      +	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
     -+		assert(the_repository->objects->odb->is_temp);
      +		if (!needs_batch_fsync)
      +			needs_batch_fsync = 1;
      +	} else {
     @@ bulk-checkin.c: int index_bulk_checkin(struct object_id *oid,
       	assert(!bulk_checkin_plugged);
      +
      +	/*
     -+	 * Create a temporary object directory if the current
     -+	 * object directory is not already temporary.
     ++	 * A temporary object directory is used to hold the files
     ++	 * while they are not fsynced.
      +	 */
     -+	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH &&
     -+	    !the_repository->objects->odb->is_temp) {
     ++	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH) {
      +		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
      +		if (!bulk_fsync_objdir)
      +			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
     @@ object-file.c: int hash_object_file(const struct git_hash_algo *algo, const void
       
       	if (close(fd) != 0)
      
     - ## tmp-objdir.c ##
     -@@ tmp-objdir.c: int tmp_objdir_migrate(struct tmp_objdir *t)
     - 	if (!t)
     - 		return 0;
     - 
     --
     --
     - 	if (t->prev_odb) {
     - 		if (the_repository->objects->odb->will_destroy)
     - 			BUG("migrating and ODB that was marked for destruction");
     -
       ## wrapper.c ##
      @@ wrapper.c: int xmkstemp_mode(char *filename_template, int mode)
       	return fd;
  5:  6c33e79d6f0 =  5:  8df32eaaa9a core.fsyncobjectfiles: add windows support for batch mode
  6:  09dbff1004e =  6:  15767270984 update-index: use the bulk-checkin infrastructure
  7:  1eced9f9f9a =  7:  e88bab809a2 unpack-objects: use the bulk-checkin infrastructure
  8:  7aaa08d5f5f =  8:  811d6d31509 core.fsyncobjectfiles: tests for batch mode
  9:  ff286fb461a =  9:  f4fa20f591e core.fsyncobjectfiles: performance tests for add and stash

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
                                 ` (8 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The tmp_objdir API provides the ability to create temporary object
directories, but was designed with the goal of having subprocesses
access these object stores, followed by the main process migrating
objects from it to the main object store or just deleting it.  The
subprocesses would view it as their primary datastore and write to it.

Here we add the tmp_objdir_replace_primary_odb function that replaces
the current process's writable "main" object directory with the
specified one. The previous main object directory is restored in either
tmp_objdir_migrate or tmp_objdir_destroy.

For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
object_database` to mark ephemeral object databases that do not require
fsync durability.

Add 'git prune' support for removing temporary object databases, and
make sure that they have a name starting with tmp_ and containing an
operation-specific name.

Based-on-patch-by: Elijah Newren <newren@gmail.com>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/prune.c        | 22 +++++++++++++++++----
 builtin/receive-pack.c |  2 +-
 object-file.c          | 44 ++++++++++++++++++++++++++++++++++++++++--
 object-store.h         | 19 ++++++++++++++++++
 object.c               |  2 +-
 tmp-objdir.c           | 30 +++++++++++++++++++++++++---
 tmp-objdir.h           | 14 +++++++++++---
 7 files changed, 119 insertions(+), 14 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 02c6ab7cbaa..9c72ecf5a58 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -18,6 +18,7 @@ static int show_only;
 static int verbose;
 static timestamp_t expire;
 static int show_progress = -1;
+static struct strbuf remove_dir_buf = STRBUF_INIT;
 
 static int prune_tmp_file(const char *fullpath)
 {
@@ -26,10 +27,19 @@ static int prune_tmp_file(const char *fullpath)
 		return error("Could not stat '%s'", fullpath);
 	if (st.st_mtime > expire)
 		return 0;
-	if (show_only || verbose)
-		printf("Removing stale temporary file %s\n", fullpath);
-	if (!show_only)
-		unlink_or_warn(fullpath);
+	if (S_ISDIR(st.st_mode)) {
+		if (show_only || verbose)
+			printf("Removing stale temporary directory %s\n", fullpath);
+		if (!show_only) {
+			strbuf_addstr(&remove_dir_buf, fullpath);
+			remove_dir_recursively(&remove_dir_buf, 0);
+		}
+	} else {
+		if (show_only || verbose)
+			printf("Removing stale temporary file %s\n", fullpath);
+		if (!show_only)
+			unlink_or_warn(fullpath);
+	}
 	return 0;
 }
 
@@ -97,6 +107,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
 
 static int prune_subdir(unsigned int nr, const char *path, void *data)
 {
+	if (verbose)
+		printf("Removing directory %s\n", path);
+
 	if (!show_only)
 		rmdir(path);
 	return 0;
@@ -185,5 +198,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
 	}
 
+	strbuf_release(&remove_dir_buf);
 	return 0;
 }
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 48960a9575b..418a42ca069 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2208,7 +2208,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		strvec_push(&child.args, alt_shallow_file);
 	}
 
-	tmp_objdir = tmp_objdir_create();
+	tmp_objdir = tmp_objdir_create("incoming");
 	if (!tmp_objdir) {
 		if (err_fd > 0)
 			close(err_fd);
diff --git a/object-file.c b/object-file.c
index be4f94ecf3b..990381abee5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -751,6 +751,43 @@ void add_to_alternates_memory(const char *reference)
 			     '\n', NULL, 0);
 }
 
+struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy)
+{
+	struct object_directory *new_odb;
+
+	/*
+	 * Make sure alternates are initialized, or else our entry may be
+	 * overwritten when they are.
+	 */
+	prepare_alt_odb(the_repository);
+
+	/*
+	 * Make a new primary odb and link the old primary ODB in as an
+	 * alternate
+	 */
+	new_odb = xcalloc(1, sizeof(*new_odb));
+	new_odb->path = xstrdup(dir);
+	new_odb->will_destroy = will_destroy;
+	new_odb->next = the_repository->objects->odb;
+	the_repository->objects->odb = new_odb;
+	return new_odb->next;
+}
+
+void restore_primary_odb(struct object_directory *restore_odb, const char *old_path)
+{
+	struct object_directory *cur_odb = the_repository->objects->odb;
+
+	if (strcmp(old_path, cur_odb->path))
+		BUG("expected %s as primary object store; found %s",
+		    old_path, cur_odb->path);
+
+	if (cur_odb->next != restore_odb)
+		BUG("we expect the old primary object store to be the first alternate");
+
+	the_repository->objects->odb = restore_odb;
+	free_object_directory(cur_odb);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
@@ -1888,8 +1925,11 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 /* Finalize a file on disk, and close it. */
 static void close_loose_object(int fd)
 {
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
+	if (!the_repository->objects->odb->will_destroy) {
+		if (fsync_object_files)
+			fsync_or_die(fd, "loose object file");
+	}
+
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
 }
diff --git a/object-store.h b/object-store.h
index c5130d8baea..74b1b5872a6 100644
--- a/object-store.h
+++ b/object-store.h
@@ -27,6 +27,11 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This object store is ephemeral, so there is no need to fsync.
+	 */
+	int will_destroy;
+
 	/*
 	 * Path to the alternative object store. If this is a relative path,
 	 * it is relative to the current working directory.
@@ -58,6 +63,17 @@ void add_to_alternates_file(const char *dir);
  */
 void add_to_alternates_memory(const char *dir);
 
+/*
+ * Replace the current writable object directory with the specified temporary
+ * object directory; returns the former primary object directory.
+ */
+struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy);
+
+/*
+ * Restore a previous ODB replaced by set_temporary_main_odb.
+ */
+void restore_primary_odb(struct object_directory *restore_odb, const char *old_path);
+
 /*
  * Populate and return the loose object cache array corresponding to the
  * given object ID.
@@ -68,6 +84,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
 /* Empty the loose object cache for the specified object directory. */
 void odb_clear_loose_cache(struct object_directory *odb);
 
+/* Clear and free the specified object directory */
+void free_object_directory(struct object_directory *odb);
+
 struct packed_git {
 	struct hashmap_entry packmap_ent;
 	struct packed_git *next;
diff --git a/object.c b/object.c
index 4e85955a941..98635bc4043 100644
--- a/object.c
+++ b/object.c
@@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
 	return o;
 }
 
-static void free_object_directory(struct object_directory *odb)
+void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index b8d880e3626..45d42a7bcf0 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -11,6 +11,7 @@
 struct tmp_objdir {
 	struct strbuf path;
 	struct strvec env;
+	struct object_directory *prev_odb;
 };
 
 /*
@@ -38,6 +39,9 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	if (t == the_tmp_objdir)
 		the_tmp_objdir = NULL;
 
+	if (!on_signal && t->prev_odb)
+		restore_primary_odb(t->prev_odb, t->path.buf);
+
 	/*
 	 * This may use malloc via strbuf_grow(), but we should
 	 * have pre-grown t->path sufficiently so that this
@@ -52,6 +56,7 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	 */
 	if (!on_signal)
 		tmp_objdir_free(t);
+
 	return err;
 }
 
@@ -121,7 +126,7 @@ static int setup_tmp_objdir(const char *root)
 	return ret;
 }
 
-struct tmp_objdir *tmp_objdir_create(void)
+struct tmp_objdir *tmp_objdir_create(const char *prefix)
 {
 	static int installed_handlers;
 	struct tmp_objdir *t;
@@ -129,11 +134,16 @@ struct tmp_objdir *tmp_objdir_create(void)
 	if (the_tmp_objdir)
 		BUG("only one tmp_objdir can be used at a time");
 
-	t = xmalloc(sizeof(*t));
+	t = xcalloc(1, sizeof(*t));
 	strbuf_init(&t->path, 0);
 	strvec_init(&t->env);
 
-	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
+	/*
+	 * Use a string starting with tmp_ so that the builtin/prune.c code
+	 * can recognize any stale objdirs left behind by a crash and delete
+	 * them.
+	 */
+	strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX", get_object_directory(), prefix);
 
 	/*
 	 * Grow the strbuf beyond any filename we expect to be placed in it.
@@ -269,6 +279,13 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	if (!t)
 		return 0;
 
+	if (t->prev_odb) {
+		if (the_repository->objects->odb->will_destroy)
+			BUG("migrating an ODB that was marked for destruction");
+		restore_primary_odb(t->prev_odb, t->path.buf);
+		t->prev_odb = NULL;
+	}
+
 	strbuf_addbuf(&src, &t->path);
 	strbuf_addstr(&dst, get_object_directory());
 
@@ -292,3 +309,10 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
 {
 	add_to_alternates_memory(t->path.buf);
 }
+
+void tmp_objdir_replace_primary_odb(struct tmp_objdir *t, int will_destroy)
+{
+	if (t->prev_odb)
+		BUG("the primary object database is already replaced");
+	t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
index b1e45b4c75d..75754cbfba6 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -10,7 +10,7 @@
  *
  * Example:
  *
- *	struct tmp_objdir *t = tmp_objdir_create();
+ *	struct tmp_objdir *t = tmp_objdir_create("incoming");
  *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
  *	    !tmp_objdir_migrate(t))
  *		printf("success!\n");
@@ -22,9 +22,10 @@
 struct tmp_objdir;
 
 /*
- * Create a new temporary object directory; returns NULL on failure.
+ * Create a new temporary object directory with the specified prefix;
+ * returns NULL on failure.
  */
-struct tmp_objdir *tmp_objdir_create(void);
+struct tmp_objdir *tmp_objdir_create(const char *prefix);
 
 /*
  * Return a list of environment strings, suitable for use with
@@ -51,4 +52,11 @@ int tmp_objdir_destroy(struct tmp_objdir *);
  */
 void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
 
+/*
+ * Replaces the main object store in the current process with the temporary
+ * object directory and makes the former main object store an alternate.
+ * If will_destroy is nonzero, the object directory may not be migrated.
+ */
+void tmp_objdir_replace_primary_odb(struct tmp_objdir *, int will_destroy);
+
 #endif /* TMP_OBJDIR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                                 ` (7 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When creating a subprocess with a temporary ODB, we set the
GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
to update refs, since the tmp-objdir may go away.

Introduce a similar mechanism for in-process temporary ODBs when
we call tmp_objdir_replace_primary_odb. Now both mechanisms set
the disable_ref_updates flag on the odb, which is queried by
the ref_transaction_prepare function.

Note: This change adds an assumption that the state of
the_repository is relevant for any ref transaction that might
be initiated. Unwinding this assumption should be straightforward
by saving the relevant repository to query in the transaction or
the ref_store.

Peff's test case was invoking ref updates via the cachetextconv
setting. That particular code silently does nothing when a ref
update is forbidden. See the call to notes_cache_put in
fill_textconv where errors are ignored.

Reported-by: Jeff King <peff@peff.net>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 environment.c  | 4 ++++
 object-file.c  | 6 ++++++
 object-store.h | 9 ++++++++-
 refs.c         | 2 +-
 repository.c   | 2 ++
 repository.h   | 1 +
 6 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/environment.c b/environment.c
index b4ba4fa22db..46ec5072c05 100644
--- a/environment.c
+++ b/environment.c
@@ -176,6 +176,10 @@ void setup_git_env(const char *git_dir)
 	args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT);
 	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
 	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
+	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+		args.disable_ref_updates = 1;
+	}
+
 	repo_set_gitdir(the_repository, git_dir, &args);
 	strvec_clear(&to_free);
 
diff --git a/object-file.c b/object-file.c
index 990381abee5..f16441afb93 100644
--- a/object-file.c
+++ b/object-file.c
@@ -767,6 +767,12 @@ struct object_directory *set_temporary_primary_odb(const char *dir, int will_des
 	 */
 	new_odb = xcalloc(1, sizeof(*new_odb));
 	new_odb->path = xstrdup(dir);
+
+	/*
+	 * Disable ref updates while a temporary odb is active, since
+	 * the objects in the database may roll back.
+	 */
+	new_odb->disable_ref_updates = 1;
 	new_odb->will_destroy = will_destroy;
 	new_odb->next = the_repository->objects->odb;
 	the_repository->objects->odb = new_odb;
diff --git a/object-store.h b/object-store.h
index 74b1b5872a6..bd53bdf2f2e 100644
--- a/object-store.h
+++ b/object-store.h
@@ -27,10 +27,17 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This is a temporary object store created by the tmp_objdir
+	 * facility. Disable ref updates since the objects in the store
+	 * might be discarded on rollback.
+	 */
+	unsigned int disable_ref_updates : 1;
+
 	/*
 	 * This object store is ephemeral, so there is no need to fsync.
 	 */
-	int will_destroy;
+	unsigned int will_destroy : 1;
 
 	/*
 	 * Path to the alternative object store. If this is a relative path,
diff --git a/refs.c b/refs.c
index 8b9f7c3a80a..7c182607dcf 100644
--- a/refs.c
+++ b/refs.c
@@ -2126,7 +2126,7 @@ int ref_transaction_prepare(struct ref_transaction *transaction,
 		break;
 	}
 
-	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+	if (the_repository->objects->odb->disable_ref_updates) {
 		strbuf_addstr(err,
 			      _("ref updates forbidden inside quarantine environment"));
 		return -1;
diff --git a/repository.c b/repository.c
index 710a3b4bf87..18e0526da01 100644
--- a/repository.c
+++ b/repository.c
@@ -80,6 +80,8 @@ void repo_set_gitdir(struct repository *repo,
 	expand_base_dir(&repo->objects->odb->path, o->object_dir,
 			repo->commondir, "objects");
 
+	repo->objects->odb->disable_ref_updates = o->disable_ref_updates;
+
 	free(repo->objects->alternate_db);
 	repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
 	expand_base_dir(&repo->graft_file, o->graft_file,
diff --git a/repository.h b/repository.h
index 3740c93bc0f..77316367d99 100644
--- a/repository.h
+++ b/repository.h
@@ -162,6 +162,7 @@ struct set_gitdir_args {
 	const char *graft_file;
 	const char *index_file;
 	const char *alternate_db;
+	int disable_ref_updates;
 };
 
 void repo_set_gitdir(struct repository *repo, const char *root,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                                 ` (6 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 8785b2ac806..6ae18401e04 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_tmp_packfile(struct strbuf *basename,
 				const char *pack_tmp_name,
@@ -277,21 +277,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (2 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                                 ` (5 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.

When the new mode is enabled do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1. Issue an fsync against a dummy file to flush the hardware writeback
   cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the case today,
   but may be a good extension to those components.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 29 +++++++++++----
 Makefile                      |  6 ++++
 bulk-checkin.c                | 68 +++++++++++++++++++++++++++++++++++
 bulk-checkin.h                |  2 ++
 cache.h                       |  8 ++++-
 config.c                      |  7 +++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 +++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 ++++
 object-file.c                 | 12 ++++++-
 wrapper.c                     | 44 +++++++++++++++++++++++
 write-or-die.c                |  2 +-
 13 files changed, 185 insertions(+), 11 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..200b4d9f06e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,29 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls loose objects in the object store, so updates to any
+	refs or the index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each loose object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware. Git does not currently fsync parent directories for
+  newly-added files, so some filesystems may still allow data to be lost on
+  system crash.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write loose object data with a minimal set of FLUSH
+  CACHE (or equivalent) commands sent to the storage controller. If the
+  operating system interfaces are not available, this mode behaves the same as
+  `true`. This mode is expected to be as safe as `true` on macOS for repos
+  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
+  ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index a9f9b689f0c..313b3dc7cd6 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1874,6 +1876,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 6ae18401e04..4deee1af46e 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,14 +3,20 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
+#include "tmp-objdir.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
+static int needs_batch_fsync;
+
+static struct tmp_objdir *bulk_fsync_objdir;
 
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
@@ -79,6 +85,34 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+/*
+ * Cleanup after batch-mode fsync_object_files.
+ */
+static void do_batch_fsync(void)
+{
+	/*
+	 * Issue a full hardware flush against a temporary file to ensure
+	 * that all objects are durable before any renames occur.  The code in
+	 * fsync_loose_object_bulk_checkin has already issued a writeout
+	 * request, but it has not flushed any writeback cache in the storage
+	 * hardware.
+	 */
+
+	if (needs_batch_fsync) {
+		struct strbuf temp_path = STRBUF_INIT;
+		struct tempfile *temp;
+
+		strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
+		temp = xmks_tempfile(temp_path.buf);
+		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
+		delete_tempfile(&temp);
+		strbuf_release(&temp_path);
+	}
+
+	if (bulk_fsync_objdir)
+		tmp_objdir_migrate(bulk_fsync_objdir);
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -273,6 +307,25 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+void fsync_loose_object_bulk_checkin(int fd)
+{
+	assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
+
+	/*
+	 * If we have a plugged bulk checkin, we issue a call that
+	 * cleans the filesystem page cache but avoids a hardware flush
+	 * command. Later on we will issue a single hardware flush
+	 * before as part of do_batch_fsync.
+	 */
+	if (bulk_checkin_plugged &&
+	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+		if (!needs_batch_fsync)
+			needs_batch_fsync = 1;
+	} else {
+		fsync_or_die(fd, "loose object file");
+	}
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -287,6 +340,19 @@ int index_bulk_checkin(struct object_id *oid,
 void plug_bulk_checkin(void)
 {
 	assert(!bulk_checkin_plugged);
+
+	/*
+	 * A temporary object directory is used to hold the files
+	 * while they are not fsynced.
+	 */
+	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH) {
+		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
+		if (!bulk_fsync_objdir)
+			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
+
+		tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
+	}
+
 	bulk_checkin_plugged = 1;
 }
 
@@ -296,4 +362,6 @@ void unplug_bulk_checkin(void)
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_batch_fsync();
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..08f292379b6 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,6 +6,8 @@
 
 #include "cache.h"
 
+void fsync_loose_object_bulk_checkin(int fd);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
diff --git a/cache.h b/cache.h
index f6295f3b048..1ed8137b5e6 100644
--- a/cache.h
+++ b/cache.h
@@ -984,7 +984,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum fsync_object_files_mode {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum fsync_object_files_mode fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 2edf835262f..8315d020eeb 100644
--- a/config.c
+++ b/config.c
@@ -1506,7 +1506,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (value && !strcmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else if (git_config_bool(var, value))
+			fsync_object_files = FSYNC_OBJECT_FILES_ON;
+		else
+			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 76516aaa9a5..e6d482fbcc6 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -53,6 +53,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index 46ec5072c05..0ace9fa2167 100644
--- a/environment.c
+++ b/environment.c
@@ -42,7 +42,7 @@ const char *git_attributes_file;
 const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum fsync_object_files_mode fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index 7c99eef6612..9daee873782 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1213,6 +1213,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index f16441afb93..b035d88f309 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1932,8 +1932,18 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 static void close_loose_object(int fd)
 {
 	if (!the_repository->objects->odb->will_destroy) {
-		if (fsync_object_files)
+		switch (fsync_object_files) {
+		case FSYNC_OBJECT_FILES_OFF:
+			break;
+		case FSYNC_OBJECT_FILES_ON:
 			fsync_or_die(fd, "loose object file");
+			break;
+		case FSYNC_OBJECT_FILES_BATCH:
+			fsync_loose_object_bulk_checkin(fd);
+			break;
+		default:
+			BUG("Invalid fsync_object_files mode.");
+		}
 	}
 
 	if (close(fd) != 0)
diff --git a/wrapper.c b/wrapper.c
index 7c6586af321..bb4f9f043ce 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -540,6 +540,50 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	switch (action) {
+	case FSYNC_WRITEOUT_ONLY:
+
+#ifdef __APPLE__
+		/*
+		 * on macOS, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+
+	case FSYNC_HARDWARE_FLUSH:
+
+#ifdef __APPLE__
+		return fcntl(fd, F_FULLFSYNC);
+#else
+		return fsync(fd);
+#endif
+
+	default:
+		BUG("unexpected git_fsync(%d) call", action);
+	}
+
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index 0b1ec8190b6..cc8291d9794 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (3 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                                 ` (4 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 28 ++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..6074a3d3ced 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..75324c24ee7
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,28 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index e6d482fbcc6..34c93314a50 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -451,6 +451,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -626,6 +627,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index 171b4124afe..b573a5ee122 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index bb4f9f043ce..1a1e2fba9c9 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -567,6 +567,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (4 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
                                 ` (3 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to synchronize with the update-index
process after passing it a file path.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..dc7368bb1ee 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1088,6 +1089,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 	the_index.updated_skipworktree = 1;
 
+	/* we might be adding many objects to the object database */
+	plug_bulk_checkin();
+
 	/*
 	 * Custom copy of parse_options() because we want to handle
 	 * filename arguments as they come.
@@ -1168,6 +1172,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	/* by now we must have added all of the new objects */
+	unplug_bulk_checkin();
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 7/9] unpack-objects: use the bulk-checkin infrastructure
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (5 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
                                 ` (2 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/unpack-objects.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 4a9466295ba..51eb4f7b531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "object-store.h"
 #include "object.h"
@@ -503,10 +504,12 @@ static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
+	plug_bulk_checkin();
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
 	}
+	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (6 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-10-04 16:57               ` [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add test cases to exercise batch mode for:
 * 'git add'
 * 'git stash'
 * 'git update-index'
 * 'git unpack-objects'

These tests ensure that the added data winds up in the object database.

In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added due
to it already being in the repo.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/lib-unique-files.sh  | 36 ++++++++++++++++++++++++++++++++++++
 t/t3700-add.sh         | 20 ++++++++++++++++++++
 t/t3903-stash.sh       | 14 ++++++++++++++
 t/t5300-pack-object.sh | 30 +++++++++++++++++++-----------
 4 files changed, 89 insertions(+), 11 deletions(-)
 create mode 100644 t/lib-unique-files.sh

diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
new file mode 100644
index 00000000000..a7de4ca8512
--- /dev/null
+++ b/t/lib-unique-files.sh
@@ -0,0 +1,36 @@
+# Helper to create files with unique contents
+
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
+#					 each in my_dir, all with unique
+#					 contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+	local counter=0
+	test_tick
+	local basedata=$test_tick
+
+
+	rm -rf $basedir
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir"
+		for j in $(test_seq $files)
+		do
+			counter=$((counter + 1))
+			echo "$basedata.$counter"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 4086e1ebbc9..36049a53ff7 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -7,6 +7,8 @@ test_description='Test of git add, including the -- option.'
 
 . ./test-lib.sh
 
+. $TEST_DIRECTORY/lib-unique-files.sh
+
 # Test the file mode "$1" of the file "$2" in the index.
 test_mode_in_index () {
 	case "$(git ls-files -s "$2")" in
@@ -33,6 +35,24 @@ test_expect_success \
     'Test that "git add -- -q" works' \
     'touch -- -q && git add -- -q'
 
+test_expect_success 'git add: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+	git ls-files --stage fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files2 &&
+	find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
+	rm -f fsynced_files2 &&
+	git ls-files --stage fsync-files2/ > fsynced_files2 &&
+	test_line_count = 8 fsynced_files2 &&
+	awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
+"
+
 test_expect_success \
 	'git add: Test that executable bit is not used if core.filemode=0' \
 	'git config core.filemode 0 &&
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 873aa56e359..2fc819e5584 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. $TEST_DIRECTORY/lib-unique-files.sh
 
 diff_cmp () {
 	for i in "$1" "$2"
@@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' '
 	git rev-parse --verify refs/stash:A.t
 '
 
+test_expect_success 'stash with core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+
+	# The files were untracked, so use the third parent,
+	# which contains the untracked files
+	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+
 test_expect_success 'stash -c stash.useBuiltin=false warning ' '
 	expected="stash.useBuiltin support has been removed" &&
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index e13a8842075..38663dc1393 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -162,23 +162,23 @@ test_expect_success 'pack-objects with bogus arguments' '
 
 check_unpack () {
 	test_when_finished "rm -rf git2" &&
-	git init --bare git2 &&
-	git -C git2 unpack-objects -n <"$1".pack &&
-	git -C git2 unpack-objects <"$1".pack &&
-	(cd .git && find objects -type f -print) |
-	while read path
-	do
-		cmp git2/$path .git/$path || {
-			echo $path differs.
-			return 1
-		}
-	done
+	git $2 init --bare git2 &&
+	(
+		git $2 -C git2 unpack-objects -n <"$1".pack &&
+		git $2 -C git2 unpack-objects <"$1".pack &&
+		git $2 -C git2 cat-file --batch-check="%(objectname)"
+	) <obj-list >current &&
+	cmp obj-list current
 }
 
 test_expect_success 'unpack without delta' '
 	check_unpack test-1-${packname_1}
 '
 
+test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
+	check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with REF_DELTA' '
 	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
 	check_deltas stderr -gt 0
@@ -188,6 +188,10 @@ test_expect_success 'unpack with REF_DELTA' '
 	check_unpack test-2-${packname_2}
 '
 
+test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with OFS_DELTA' '
 	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
 			<obj-list 2>stderr) &&
@@ -198,6 +202,10 @@ test_expect_success 'unpack with OFS_DELTA' '
 	check_unpack test-3-${packname_3}
 '
 
+test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'compare delta flavors' '
 	perl -e '\''
 		defined($_ = -s $_) or die for @ARGV;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (7 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-10-04 16:57               ` Neeraj Singh via GitGitGadget
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-10-04 16:57 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/p3700-add.sh   | 43 ++++++++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..e93c08a2e70
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..c9fcd0c03eb
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                 ` (8 preceding siblings ...)
  2021-10-04 16:57               ` [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-11-15 23:50               ` Neeraj K. Singh via GitGitGadget
  2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
                                   ` (9 more replies)
  9 siblings, 10 replies; 160+ messages in thread
From: Neeraj K. Singh via GitGitGadget @ 2021-11-15 23:50 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh

Thanks to everyone for review so far!

This series shares the base tmp-objdir patches with my merged version of
Elijah Newren's remerge-diff series at:
https://github.com/neerajsi-msft/git/tree/neerajsi/remerge-diff.

Changes between v8 and v9 [1]:

 * Rebased onto master at tag v2.34.0
 * Fixed git-prune bug when trying to clean up multiple cruft directories.
 * Preserve the tmp-objdir around update_relative_gitdir, which is called by
   setup_work_tree through the chdir_notify mechanism.
 * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
   'true', 'false', and 'batch'. This makes using old and new versions of
   git with 'batch' mode a little trickier, but hopefully people will
   generally be moving forward in versions.

[1] See
https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/

Changes between v7 and v8:

 * Dropped the tmp-objdir patch to avoid renaming in a quarantine/temporary
   objdir, as suggested by Jeff King. This wasn't a good idea because we
   don't really know that there's only a single reader/writer. Avoiding the
   rename was a relatively minor perf optimization so it's okay to drop.

 * Added disable_ref_updates logic (as a flag on the odb) which is set when
   we're in a quarantine or when a tmp objdir is active. I believe this
   roughly follows the strategy suggested by Jeff King.

Neeraj Singh (9):
  tmp-objdir: new API for creating temporary writable databases
  tmp-objdir: disable ref updates when replacing the primary odb
  bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  core.fsyncobjectfiles: batched disk flushes
  core.fsyncobjectfiles: add windows support for batch mode
  update-index: use the bulk-checkin infrastructure
  unpack-objects: use the bulk-checkin infrastructure
  core.fsyncobjectfiles: tests for batch mode
  core.fsyncobjectfiles: performance tests for add and stash

 Documentation/config/core.txt       | 29 ++++++++--
 Makefile                            |  6 ++
 builtin/prune.c                     | 23 ++++++--
 builtin/receive-pack.c              |  2 +-
 builtin/unpack-objects.c            |  3 +
 builtin/update-index.c              |  6 ++
 bulk-checkin.c                      | 90 +++++++++++++++++++++++++----
 bulk-checkin.h                      |  2 +
 cache.h                             |  8 ++-
 compat/mingw.h                      |  3 +
 compat/win32/flush.c                | 28 +++++++++
 config.c                            |  7 ++-
 config.mak.uname                    |  3 +
 configure.ac                        |  8 +++
 contrib/buildsystems/CMakeLists.txt |  3 +-
 environment.c                       | 11 +++-
 git-compat-util.h                   |  7 +++
 object-file.c                       | 60 ++++++++++++++++++-
 object-store.h                      | 26 +++++++++
 object.c                            |  2 +-
 refs.c                              |  2 +-
 repository.c                        |  2 +
 repository.h                        |  1 +
 t/lib-unique-files.sh               | 36 ++++++++++++
 t/perf/p3700-add.sh                 | 43 ++++++++++++++
 t/perf/p3900-stash.sh               | 46 +++++++++++++++
 t/t3700-add.sh                      | 20 +++++++
 t/t3903-stash.sh                    | 14 +++++
 t/t5300-pack-object.sh              | 30 ++++++----
 tmp-objdir.c                        | 55 +++++++++++++++++-
 tmp-objdir.h                        | 29 +++++++++-
 wrapper.c                           | 48 +++++++++++++++
 write-or-die.c                      |  2 +-
 33 files changed, 608 insertions(+), 47 deletions(-)
 create mode 100644 compat/win32/flush.c
 create mode 100644 t/lib-unique-files.sh
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh


base-commit: cd3e606211bb1cf8bc57f7d76bab98cc17a150bc
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1076%2Fneerajsi-msft%2Fneerajsi%2Fbulk-fsync-object-files-v9
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1076/neerajsi-msft/neerajsi/bulk-fsync-object-files-v9
Pull-Request: https://github.com/git/git/pull/1076

Range-diff vs v8:

  1:  f03797fd80d !  1:  6b27afa60e0 tmp-objdir: new API for creating temporary writable databases
     @@ Commit message
          Based-on-patch-by: Elijah Newren <newren@gmail.com>
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/prune.c ##
      @@ builtin/prune.c: static int show_only;
     @@ builtin/prune.c: static int prune_tmp_file(const char *fullpath)
      +		if (show_only || verbose)
      +			printf("Removing stale temporary directory %s\n", fullpath);
      +		if (!show_only) {
     ++			strbuf_reset(&remove_dir_buf);
      +			strbuf_addstr(&remove_dir_buf, fullpath);
      +			remove_dir_recursively(&remove_dir_buf, 0);
      +		}
     @@ builtin/receive-pack.c: static const char *unpack(int err_fd, struct shallow_inf
       		if (err_fd > 0)
       			close(err_fd);
      
     + ## environment.c ##
     +@@
     + #include "commit.h"
     + #include "strvec.h"
     + #include "object-store.h"
     ++#include "tmp-objdir.h"
     + #include "chdir-notify.h"
     + #include "shallow.h"
     + 
     +@@ environment.c: static void update_relative_gitdir(const char *name,
     + 				   void *data)
     + {
     + 	char *path = reparent_relative_path(old_cwd, new_cwd, get_git_dir());
     ++	struct tmp_objdir *tmp_objdir = tmp_objdir_unapply_primary_odb();
     + 	trace_printf_key(&trace_setup_key,
     + 			 "setup: move $GIT_DIR to '%s'",
     + 			 path);
     ++
     + 	set_git_dir_1(path);
     ++	if (tmp_objdir)
     ++		tmp_objdir_reapply_primary_odb(tmp_objdir, old_cwd, new_cwd);
     + 	free(path);
     + }
     + 
     +
       ## object-file.c ##
      @@ object-file.c: void add_to_alternates_memory(const char *reference)
       			     '\n', NULL, 0);
     @@ object.c: struct raw_object_store *raw_object_store_new(void)
       	odb_clear_loose_cache(odb);
      
       ## tmp-objdir.c ##
     +@@
     + #include "cache.h"
     + #include "tmp-objdir.h"
     ++#include "chdir-notify.h"
     + #include "dir.h"
     + #include "sigchain.h"
     + #include "string-list.h"
      @@
       struct tmp_objdir {
       	struct strbuf path;
       	struct strvec env;
      +	struct object_directory *prev_odb;
     ++	int will_destroy;
       };
       
       /*
     @@ tmp-objdir.c: void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
      +	if (t->prev_odb)
      +		BUG("the primary object database is already replaced");
      +	t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);
     ++	t->will_destroy = will_destroy;
     ++}
     ++
     ++struct tmp_objdir *tmp_objdir_unapply_primary_odb(void)
     ++{
     ++	if (!the_tmp_objdir || !the_tmp_objdir->prev_odb)
     ++		return NULL;
     ++
     ++	restore_primary_odb(the_tmp_objdir->prev_odb, the_tmp_objdir->path.buf);
     ++	the_tmp_objdir->prev_odb = NULL;
     ++	return the_tmp_objdir;
     ++}
     ++
     ++void tmp_objdir_reapply_primary_odb(struct tmp_objdir *t, const char *old_cwd,
     ++		const char *new_cwd)
     ++{
     ++	char *path;
     ++
     ++	path = reparent_relative_path(old_cwd, new_cwd, t->path.buf);
     ++	strbuf_reset(&t->path);
     ++	strbuf_addstr(&t->path, path);
     ++	free(path);
     ++	tmp_objdir_replace_primary_odb(t, t->will_destroy);
      +}
      
       ## tmp-objdir.h ##
     @@ tmp-objdir.h: int tmp_objdir_destroy(struct tmp_objdir *);
      + * If will_destroy is nonzero, the object directory may not be migrated.
      + */
      +void tmp_objdir_replace_primary_odb(struct tmp_objdir *, int will_destroy);
     ++
     ++/*
     ++ * If the primary object database was replaced by a temporary object directory,
     ++ * restore it to its original value while keeping the directory contents around.
     ++ * Returns NULL if the primary object database was not replaced.
     ++ */
     ++struct tmp_objdir *tmp_objdir_unapply_primary_odb(void);
     ++
     ++/*
     ++ * Reapplies the former primary temporary object database, after protentially
     ++ * changing its relative path.
     ++ */
     ++void tmp_objdir_reapply_primary_odb(struct tmp_objdir *, const char *old_cwd,
     ++		const char *new_cwd);
     ++
      +
       #endif /* TMP_OBJDIR_H */
  2:  bc085137340 !  2:  71817cccfb9 tmp-objdir: disable ref updates when replacing the primary odb
     @@ Commit message
          Reported-by: Jeff King <peff@peff.net>
      
          Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## environment.c ##
      @@ environment.c: void setup_git_env(const char *git_dir)
  3:  9335646ed91 =  3:  8fd1ca4c00a bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  4:  b9d3d874432 =  4:  e1747ce00af core.fsyncobjectfiles: batched disk flushes
  5:  8df32eaaa9a !  5:  951a559874e core.fsyncobjectfiles: add windows support for batch mode
     @@ config.mak.uname: endif
       		compat/win32/path-utils.o \
       		compat/win32/pthread.o compat/win32/syslog.o \
       		compat/win32/trace2_win32_process_info.o \
     -@@ config.mak.uname: ifneq (,$(findstring MINGW,$(uname_S)))
     +@@ config.mak.uname: ifeq ($(uname_S),MINGW)
       	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
       	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
       		compat/win32/trace2_win32_process_info.o \
  6:  15767270984 =  6:  4a40fd4a29a update-index: use the bulk-checkin infrastructure
  7:  e88bab809a2 =  7:  cfc6a347d08 unpack-objects: use the bulk-checkin infrastructure
  8:  811d6d31509 !  8:  270c24827d0 core.fsyncobjectfiles: tests for batch mode
     @@ t/lib-unique-files.sh (new)
      
       ## t/t3700-add.sh ##
      @@ t/t3700-add.sh: test_description='Test of git add, including the -- option.'
     - 
     + TEST_PASSES_SANITIZE_LEAK=true
       . ./test-lib.sh
       
      +. $TEST_DIRECTORY/lib-unique-files.sh
  9:  f4fa20f591e =  9:  12d99641f4c core.fsyncobjectfiles: performance tests for add and stash

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
@ 2021-11-15 23:50                 ` Neeraj Singh via GitGitGadget
  2021-11-30 21:27                   ` Elijah Newren
  2021-11-15 23:50                 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
                                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:50 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The tmp_objdir API provides the ability to create temporary object
directories, but was designed with the goal of having subprocesses
access these object stores, followed by the main process migrating
objects from it to the main object store or just deleting it.  The
subprocesses would view it as their primary datastore and write to it.

Here we add the tmp_objdir_replace_primary_odb function that replaces
the current process's writable "main" object directory with the
specified one. The previous main object directory is restored in either
tmp_objdir_migrate or tmp_objdir_destroy.

For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
object_database` to mark ephemeral object databases that do not require
fsync durability.

Add 'git prune' support for removing temporary object databases, and
make sure that they have a name starting with tmp_ and containing an
operation-specific name.

Based-on-patch-by: Elijah Newren <newren@gmail.com>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/prune.c        | 23 +++++++++++++++---
 builtin/receive-pack.c |  2 +-
 environment.c          |  5 ++++
 object-file.c          | 44 +++++++++++++++++++++++++++++++--
 object-store.h         | 19 +++++++++++++++
 object.c               |  2 +-
 tmp-objdir.c           | 55 +++++++++++++++++++++++++++++++++++++++---
 tmp-objdir.h           | 29 +++++++++++++++++++---
 8 files changed, 165 insertions(+), 14 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index 485c9a3c56f..a76e6a5f0e8 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -18,6 +18,7 @@ static int show_only;
 static int verbose;
 static timestamp_t expire;
 static int show_progress = -1;
+static struct strbuf remove_dir_buf = STRBUF_INIT;
 
 static int prune_tmp_file(const char *fullpath)
 {
@@ -26,10 +27,20 @@ static int prune_tmp_file(const char *fullpath)
 		return error("Could not stat '%s'", fullpath);
 	if (st.st_mtime > expire)
 		return 0;
-	if (show_only || verbose)
-		printf("Removing stale temporary file %s\n", fullpath);
-	if (!show_only)
-		unlink_or_warn(fullpath);
+	if (S_ISDIR(st.st_mode)) {
+		if (show_only || verbose)
+			printf("Removing stale temporary directory %s\n", fullpath);
+		if (!show_only) {
+			strbuf_reset(&remove_dir_buf);
+			strbuf_addstr(&remove_dir_buf, fullpath);
+			remove_dir_recursively(&remove_dir_buf, 0);
+		}
+	} else {
+		if (show_only || verbose)
+			printf("Removing stale temporary file %s\n", fullpath);
+		if (!show_only)
+			unlink_or_warn(fullpath);
+	}
 	return 0;
 }
 
@@ -97,6 +108,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
 
 static int prune_subdir(unsigned int nr, const char *path, void *data)
 {
+	if (verbose)
+		printf("Removing directory %s\n", path);
+
 	if (!show_only)
 		rmdir(path);
 	return 0;
@@ -184,5 +198,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 		prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
 	}
 
+	strbuf_release(&remove_dir_buf);
 	return 0;
 }
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 49b846d9605..8815e24cde5 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2213,7 +2213,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		strvec_push(&child.args, alt_shallow_file);
 	}
 
-	tmp_objdir = tmp_objdir_create();
+	tmp_objdir = tmp_objdir_create("incoming");
 	if (!tmp_objdir) {
 		if (err_fd > 0)
 			close(err_fd);
diff --git a/environment.c b/environment.c
index 9da7f3c1a19..342400fcaad 100644
--- a/environment.c
+++ b/environment.c
@@ -17,6 +17,7 @@
 #include "commit.h"
 #include "strvec.h"
 #include "object-store.h"
+#include "tmp-objdir.h"
 #include "chdir-notify.h"
 #include "shallow.h"
 
@@ -331,10 +332,14 @@ static void update_relative_gitdir(const char *name,
 				   void *data)
 {
 	char *path = reparent_relative_path(old_cwd, new_cwd, get_git_dir());
+	struct tmp_objdir *tmp_objdir = tmp_objdir_unapply_primary_odb();
 	trace_printf_key(&trace_setup_key,
 			 "setup: move $GIT_DIR to '%s'",
 			 path);
+
 	set_git_dir_1(path);
+	if (tmp_objdir)
+		tmp_objdir_reapply_primary_odb(tmp_objdir, old_cwd, new_cwd);
 	free(path);
 }
 
diff --git a/object-file.c b/object-file.c
index c3d866a287e..0b6a61aeaff 100644
--- a/object-file.c
+++ b/object-file.c
@@ -683,6 +683,43 @@ void add_to_alternates_memory(const char *reference)
 			     '\n', NULL, 0);
 }
 
+struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy)
+{
+	struct object_directory *new_odb;
+
+	/*
+	 * Make sure alternates are initialized, or else our entry may be
+	 * overwritten when they are.
+	 */
+	prepare_alt_odb(the_repository);
+
+	/*
+	 * Make a new primary odb and link the old primary ODB in as an
+	 * alternate
+	 */
+	new_odb = xcalloc(1, sizeof(*new_odb));
+	new_odb->path = xstrdup(dir);
+	new_odb->will_destroy = will_destroy;
+	new_odb->next = the_repository->objects->odb;
+	the_repository->objects->odb = new_odb;
+	return new_odb->next;
+}
+
+void restore_primary_odb(struct object_directory *restore_odb, const char *old_path)
+{
+	struct object_directory *cur_odb = the_repository->objects->odb;
+
+	if (strcmp(old_path, cur_odb->path))
+		BUG("expected %s as primary object store; found %s",
+		    old_path, cur_odb->path);
+
+	if (cur_odb->next != restore_odb)
+		BUG("we expect the old primary object store to be the first alternate");
+
+	the_repository->objects->odb = restore_odb;
+	free_object_directory(cur_odb);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
@@ -1809,8 +1846,11 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 /* Finalize a file on disk, and close it. */
 static void close_loose_object(int fd)
 {
-	if (fsync_object_files)
-		fsync_or_die(fd, "loose object file");
+	if (!the_repository->objects->odb->will_destroy) {
+		if (fsync_object_files)
+			fsync_or_die(fd, "loose object file");
+	}
+
 	if (close(fd) != 0)
 		die_errno(_("error when closing loose object file"));
 }
diff --git a/object-store.h b/object-store.h
index 952efb6a4be..cb173e69392 100644
--- a/object-store.h
+++ b/object-store.h
@@ -27,6 +27,11 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This object store is ephemeral, so there is no need to fsync.
+	 */
+	int will_destroy;
+
 	/*
 	 * Path to the alternative object store. If this is a relative path,
 	 * it is relative to the current working directory.
@@ -58,6 +63,17 @@ void add_to_alternates_file(const char *dir);
  */
 void add_to_alternates_memory(const char *dir);
 
+/*
+ * Replace the current writable object directory with the specified temporary
+ * object directory; returns the former primary object directory.
+ */
+struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy);
+
+/*
+ * Restore a previous ODB replaced by set_temporary_main_odb.
+ */
+void restore_primary_odb(struct object_directory *restore_odb, const char *old_path);
+
 /*
  * Populate and return the loose object cache array corresponding to the
  * given object ID.
@@ -68,6 +84,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
 /* Empty the loose object cache for the specified object directory. */
 void odb_clear_loose_cache(struct object_directory *odb);
 
+/* Clear and free the specified object directory */
+void free_object_directory(struct object_directory *odb);
+
 struct packed_git {
 	struct hashmap_entry packmap_ent;
 	struct packed_git *next;
diff --git a/object.c b/object.c
index 23a24e678a8..048f96a260e 100644
--- a/object.c
+++ b/object.c
@@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
 	return o;
 }
 
-static void free_object_directory(struct object_directory *odb)
+void free_object_directory(struct object_directory *odb)
 {
 	free(odb->path);
 	odb_clear_loose_cache(odb);
diff --git a/tmp-objdir.c b/tmp-objdir.c
index b8d880e3626..3d38eeab66b 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "tmp-objdir.h"
+#include "chdir-notify.h"
 #include "dir.h"
 #include "sigchain.h"
 #include "string-list.h"
@@ -11,6 +12,8 @@
 struct tmp_objdir {
 	struct strbuf path;
 	struct strvec env;
+	struct object_directory *prev_odb;
+	int will_destroy;
 };
 
 /*
@@ -38,6 +41,9 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	if (t == the_tmp_objdir)
 		the_tmp_objdir = NULL;
 
+	if (!on_signal && t->prev_odb)
+		restore_primary_odb(t->prev_odb, t->path.buf);
+
 	/*
 	 * This may use malloc via strbuf_grow(), but we should
 	 * have pre-grown t->path sufficiently so that this
@@ -52,6 +58,7 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
 	 */
 	if (!on_signal)
 		tmp_objdir_free(t);
+
 	return err;
 }
 
@@ -121,7 +128,7 @@ static int setup_tmp_objdir(const char *root)
 	return ret;
 }
 
-struct tmp_objdir *tmp_objdir_create(void)
+struct tmp_objdir *tmp_objdir_create(const char *prefix)
 {
 	static int installed_handlers;
 	struct tmp_objdir *t;
@@ -129,11 +136,16 @@ struct tmp_objdir *tmp_objdir_create(void)
 	if (the_tmp_objdir)
 		BUG("only one tmp_objdir can be used at a time");
 
-	t = xmalloc(sizeof(*t));
+	t = xcalloc(1, sizeof(*t));
 	strbuf_init(&t->path, 0);
 	strvec_init(&t->env);
 
-	strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
+	/*
+	 * Use a string starting with tmp_ so that the builtin/prune.c code
+	 * can recognize any stale objdirs left behind by a crash and delete
+	 * them.
+	 */
+	strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX", get_object_directory(), prefix);
 
 	/*
 	 * Grow the strbuf beyond any filename we expect to be placed in it.
@@ -269,6 +281,13 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
 	if (!t)
 		return 0;
 
+	if (t->prev_odb) {
+		if (the_repository->objects->odb->will_destroy)
+			BUG("migrating an ODB that was marked for destruction");
+		restore_primary_odb(t->prev_odb, t->path.buf);
+		t->prev_odb = NULL;
+	}
+
 	strbuf_addbuf(&src, &t->path);
 	strbuf_addstr(&dst, get_object_directory());
 
@@ -292,3 +311,33 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
 {
 	add_to_alternates_memory(t->path.buf);
 }
+
+void tmp_objdir_replace_primary_odb(struct tmp_objdir *t, int will_destroy)
+{
+	if (t->prev_odb)
+		BUG("the primary object database is already replaced");
+	t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);
+	t->will_destroy = will_destroy;
+}
+
+struct tmp_objdir *tmp_objdir_unapply_primary_odb(void)
+{
+	if (!the_tmp_objdir || !the_tmp_objdir->prev_odb)
+		return NULL;
+
+	restore_primary_odb(the_tmp_objdir->prev_odb, the_tmp_objdir->path.buf);
+	the_tmp_objdir->prev_odb = NULL;
+	return the_tmp_objdir;
+}
+
+void tmp_objdir_reapply_primary_odb(struct tmp_objdir *t, const char *old_cwd,
+		const char *new_cwd)
+{
+	char *path;
+
+	path = reparent_relative_path(old_cwd, new_cwd, t->path.buf);
+	strbuf_reset(&t->path);
+	strbuf_addstr(&t->path, path);
+	free(path);
+	tmp_objdir_replace_primary_odb(t, t->will_destroy);
+}
diff --git a/tmp-objdir.h b/tmp-objdir.h
index b1e45b4c75d..a3145051f25 100644
--- a/tmp-objdir.h
+++ b/tmp-objdir.h
@@ -10,7 +10,7 @@
  *
  * Example:
  *
- *	struct tmp_objdir *t = tmp_objdir_create();
+ *	struct tmp_objdir *t = tmp_objdir_create("incoming");
  *	if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
  *	    !tmp_objdir_migrate(t))
  *		printf("success!\n");
@@ -22,9 +22,10 @@
 struct tmp_objdir;
 
 /*
- * Create a new temporary object directory; returns NULL on failure.
+ * Create a new temporary object directory with the specified prefix;
+ * returns NULL on failure.
  */
-struct tmp_objdir *tmp_objdir_create(void);
+struct tmp_objdir *tmp_objdir_create(const char *prefix);
 
 /*
  * Return a list of environment strings, suitable for use with
@@ -51,4 +52,26 @@ int tmp_objdir_destroy(struct tmp_objdir *);
  */
 void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
 
+/*
+ * Replaces the main object store in the current process with the temporary
+ * object directory and makes the former main object store an alternate.
+ * If will_destroy is nonzero, the object directory may not be migrated.
+ */
+void tmp_objdir_replace_primary_odb(struct tmp_objdir *, int will_destroy);
+
+/*
+ * If the primary object database was replaced by a temporary object directory,
+ * restore it to its original value while keeping the directory contents around.
+ * Returns NULL if the primary object database was not replaced.
+ */
+struct tmp_objdir *tmp_objdir_unapply_primary_odb(void);
+
+/*
+ * Reapplies the former primary temporary object database, after protentially
+ * changing its relative path.
+ */
+void tmp_objdir_reapply_primary_odb(struct tmp_objdir *, const char *old_cwd,
+		const char *new_cwd);
+
+
 #endif /* TMP_OBJDIR_H */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
@ 2021-11-15 23:50                 ` Neeraj Singh via GitGitGadget
  2021-11-16  7:23                   ` Ævar Arnfjörð Bjarmason
  2021-11-15 23:50                 ` [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
                                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:50 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When creating a subprocess with a temporary ODB, we set the
GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not
to update refs, since the tmp-objdir may go away.

Introduce a similar mechanism for in-process temporary ODBs when
we call tmp_objdir_replace_primary_odb. Now both mechanisms set
the disable_ref_updates flag on the odb, which is queried by
the ref_transaction_prepare function.

Note: This change adds an assumption that the state of
the_repository is relevant for any ref transaction that might
be initiated. Unwinding this assumption should be straightforward
by saving the relevant repository to query in the transaction or
the ref_store.

Peff's test case was invoking ref updates via the cachetextconv
setting. That particular code silently does nothing when a ref
update is forbidden. See the call to notes_cache_put in
fill_textconv where errors are ignored.

Reported-by: Jeff King <peff@peff.net>

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 environment.c  | 4 ++++
 object-file.c  | 6 ++++++
 object-store.h | 9 ++++++++-
 refs.c         | 2 +-
 repository.c   | 2 ++
 repository.h   | 1 +
 6 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/environment.c b/environment.c
index 342400fcaad..2701dfeeec8 100644
--- a/environment.c
+++ b/environment.c
@@ -169,6 +169,10 @@ void setup_git_env(const char *git_dir)
 	args.graft_file = getenv_safe(&to_free, GRAFT_ENVIRONMENT);
 	args.index_file = getenv_safe(&to_free, INDEX_ENVIRONMENT);
 	args.alternate_db = getenv_safe(&to_free, ALTERNATE_DB_ENVIRONMENT);
+	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+		args.disable_ref_updates = 1;
+	}
+
 	repo_set_gitdir(the_repository, git_dir, &args);
 	strvec_clear(&to_free);
 
diff --git a/object-file.c b/object-file.c
index 0b6a61aeaff..659ef7623ff 100644
--- a/object-file.c
+++ b/object-file.c
@@ -699,6 +699,12 @@ struct object_directory *set_temporary_primary_odb(const char *dir, int will_des
 	 */
 	new_odb = xcalloc(1, sizeof(*new_odb));
 	new_odb->path = xstrdup(dir);
+
+	/*
+	 * Disable ref updates while a temporary odb is active, since
+	 * the objects in the database may roll back.
+	 */
+	new_odb->disable_ref_updates = 1;
 	new_odb->will_destroy = will_destroy;
 	new_odb->next = the_repository->objects->odb;
 	the_repository->objects->odb = new_odb;
diff --git a/object-store.h b/object-store.h
index cb173e69392..9ae9262c340 100644
--- a/object-store.h
+++ b/object-store.h
@@ -27,10 +27,17 @@ struct object_directory {
 	uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
 	struct oidtree *loose_objects_cache;
 
+	/*
+	 * This is a temporary object store created by the tmp_objdir
+	 * facility. Disable ref updates since the objects in the store
+	 * might be discarded on rollback.
+	 */
+	unsigned int disable_ref_updates : 1;
+
 	/*
 	 * This object store is ephemeral, so there is no need to fsync.
 	 */
-	int will_destroy;
+	unsigned int will_destroy : 1;
 
 	/*
 	 * Path to the alternative object store. If this is a relative path,
diff --git a/refs.c b/refs.c
index d7cc0a23a3b..27ec7d1fc64 100644
--- a/refs.c
+++ b/refs.c
@@ -2137,7 +2137,7 @@ int ref_transaction_prepare(struct ref_transaction *transaction,
 		break;
 	}
 
-	if (getenv(GIT_QUARANTINE_ENVIRONMENT)) {
+	if (the_repository->objects->odb->disable_ref_updates) {
 		strbuf_addstr(err,
 			      _("ref updates forbidden inside quarantine environment"));
 		return -1;
diff --git a/repository.c b/repository.c
index c5b90ba93ea..dce8e35ac20 100644
--- a/repository.c
+++ b/repository.c
@@ -80,6 +80,8 @@ void repo_set_gitdir(struct repository *repo,
 	expand_base_dir(&repo->objects->odb->path, o->object_dir,
 			repo->commondir, "objects");
 
+	repo->objects->odb->disable_ref_updates = o->disable_ref_updates;
+
 	free(repo->objects->alternate_db);
 	repo->objects->alternate_db = xstrdup_or_null(o->alternate_db);
 	expand_base_dir(&repo->graft_file, o->graft_file,
diff --git a/repository.h b/repository.h
index a057653981c..7c04e99ac5c 100644
--- a/repository.h
+++ b/repository.h
@@ -158,6 +158,7 @@ struct set_gitdir_args {
 	const char *graft_file;
 	const char *index_file;
 	const char *alternate_db;
+	int disable_ref_updates;
 };
 
 void repo_set_gitdir(struct repository *repo, const char *root,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
  2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
  2021-11-15 23:50                 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
@ 2021-11-15 23:50                 ` Neeraj Singh via GitGitGadget
  2021-11-15 23:50                 ` [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
                                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:50 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Preparation for adding bulk-fsync to the bulk-checkin.c infrastructure.

* Rename 'state' variable to 'bulk_checkin_state', since we will later
  be adding 'bulk_fsync_state'.  This also makes the variable easier to
  find in the debugger, since the name is more unique.

* Move the 'plugged' data member of 'bulk_checkin_state' into a separate
  static variable. Doing this avoids resetting the variable in
  finish_bulk_checkin when zeroing the 'bulk_checkin_state'. As-is, we
  seem to unintentionally disable the plugging functionality the first
  time a new packfile must be created due to packfile size limits. While
  disabling the plugging state only results in suboptimal behavior for
  the current code, it would be fatal for the bulk-fsync functionality
  later in this patch series.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 bulk-checkin.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/bulk-checkin.c b/bulk-checkin.c
index 8785b2ac806..6ae18401e04 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -10,9 +10,9 @@
 #include "packfile.h"
 #include "object-store.h"
 
-static struct bulk_checkin_state {
-	unsigned plugged:1;
+static int bulk_checkin_plugged;
 
+static struct bulk_checkin_state {
 	char *pack_tmp_name;
 	struct hashfile *f;
 	off_t offset;
@@ -21,7 +21,7 @@ static struct bulk_checkin_state {
 	struct pack_idx_entry **written;
 	uint32_t alloc_written;
 	uint32_t nr_written;
-} state;
+} bulk_checkin_state;
 
 static void finish_tmp_packfile(struct strbuf *basename,
 				const char *pack_tmp_name,
@@ -277,21 +277,23 @@ int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
 {
-	int status = deflate_to_pack(&state, oid, fd, size, type,
+	int status = deflate_to_pack(&bulk_checkin_state, oid, fd, size, type,
 				     path, flags);
-	if (!state.plugged)
-		finish_bulk_checkin(&state);
+	if (!bulk_checkin_plugged)
+		finish_bulk_checkin(&bulk_checkin_state);
 	return status;
 }
 
 void plug_bulk_checkin(void)
 {
-	state.plugged = 1;
+	assert(!bulk_checkin_plugged);
+	bulk_checkin_plugged = 1;
 }
 
 void unplug_bulk_checkin(void)
 {
-	state.plugged = 0;
-	if (state.f)
-		finish_bulk_checkin(&state);
+	assert(bulk_checkin_plugged);
+	bulk_checkin_plugged = 0;
+	if (bulk_checkin_state.f)
+		finish_bulk_checkin(&bulk_checkin_state);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (2 preceding siblings ...)
  2021-11-15 23:50                 ` [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
@ 2021-11-15 23:50                 ` Neeraj Singh via GitGitGadget
  2021-11-15 23:50                 ` [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
                                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:50 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

When adding many objects to a repo with core.fsyncObjectFiles set to
true, the cost of fsync'ing each object file can become prohibitive.

One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. Fortunately, Windows,
and macOS offer mechanisms to write data from the filesystem page cache
without initiating a hardware flush. Linux has the sync_file_range API,
which issues a pagecache writeback request reliably after version 5.2.

This patch introduces a new 'core.fsyncObjectFiles = batch' option that
batches up hardware flushes. It hooks into the bulk-checkin plugging and
unplugging functionality and takes advantage of tmp-objdir.

When the new mode is enabled do the following for each new object:
1. Create the object in a tmp-objdir.
2. Issue a pagecache writeback request and wait for it to complete.

At the end of the entire transaction when unplugging bulk checkin:
1. Issue an fsync against a dummy file to flush the hardware writeback
   cache, which should by now have processed the tmp-objdir writes.
2. Rename all of the tmp-objdir files to their final names.
3. When updating the index and/or refs, we assume that Git will issue
   another fsync internal to that operation. This is not the case today,
   but may be a good extension to those components.

On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns.

This change also updates the macOS code to trigger a real hardware flush
via fnctl(fd, F_FULLFSYNC) when fsync_or_die is called. Previously, on
macOS there was no guarantee of durability since a simple fsync(2) call
does not flush any hardware caches.

_Performance numbers_:

Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
	  This number is from a patch later in the series.

Adding 500 files to the repo with 'git add' Times reported in seconds.

core.fsyncObjectFiles | Linux | Mac   | Windows
----------------------|-------|-------|--------
                false | 0.06  |  0.35 | 0.61
                true  | 1.88  | 11.18 | 2.47
                batch | 0.15  |  0.41 | 1.53

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 Documentation/config/core.txt | 29 +++++++++++----
 Makefile                      |  6 ++++
 bulk-checkin.c                | 68 +++++++++++++++++++++++++++++++++++
 bulk-checkin.h                |  2 ++
 cache.h                       |  8 ++++-
 config.c                      |  7 +++-
 config.mak.uname              |  1 +
 configure.ac                  |  8 +++++
 environment.c                 |  2 +-
 git-compat-util.h             |  7 ++++
 object-file.c                 | 12 ++++++-
 wrapper.c                     | 44 +++++++++++++++++++++++
 write-or-die.c                |  2 +-
 13 files changed, 185 insertions(+), 11 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c04f62a54a1..200b4d9f06e 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -548,12 +548,29 @@ core.whitespace::
   errors. The default tab width is 8. Allowed values are 1 to 63.
 
 core.fsyncObjectFiles::
-	This boolean will enable 'fsync()' when writing object files.
-+
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with "data=writeback").
+	A value indicating the level of effort Git will expend in
+	trying to make objects added to the repo durable in the event
+	of an unclean system shutdown. This setting currently only
+	controls loose objects in the object store, so updates to any
+	refs or the index may not be equally durable.
++
+* `false` allows data to remain in file system caches according to
+  operating system policy, whence it may be lost if the system loses power
+  or crashes.
+* `true` triggers a data integrity flush for each loose object added to the
+  object store. This is the safest setting that is likely to ensure durability
+  across all operating systems and file systems that honor the 'fsync' system
+  call. However, this setting comes with a significant performance cost on
+  common hardware. Git does not currently fsync parent directories for
+  newly-added files, so some filesystems may still allow data to be lost on
+  system crash.
+* `batch` enables an experimental mode that uses interfaces available in some
+  operating systems to write loose object data with a minimal set of FLUSH
+  CACHE (or equivalent) commands sent to the storage controller. If the
+  operating system interfaces are not available, this mode behaves the same as
+  `true`. This mode is expected to be as safe as `true` on macOS for repos
+  stored on HFS+ or APFS filesystems and on Windows for repos stored on NTFS or
+  ReFS.
 
 core.preloadIndex::
 	Enable parallel index preload for operations like 'git diff'
diff --git a/Makefile b/Makefile
index 12be39ac497..241dc322c09 100644
--- a/Makefile
+++ b/Makefile
@@ -406,6 +406,8 @@ all::
 #
 # Define HAVE_CLOCK_MONOTONIC if your platform has CLOCK_MONOTONIC.
 #
+# Define HAVE_SYNC_FILE_RANGE if your platform has sync_file_range.
+#
 # Define NEEDS_LIBRT if your platform requires linking with librt (glibc version
 # before 2.17) for clock_gettime and CLOCK_MONOTONIC.
 #
@@ -1884,6 +1886,10 @@ ifdef HAVE_CLOCK_MONOTONIC
 	BASIC_CFLAGS += -DHAVE_CLOCK_MONOTONIC
 endif
 
+ifdef HAVE_SYNC_FILE_RANGE
+	BASIC_CFLAGS += -DHAVE_SYNC_FILE_RANGE
+endif
+
 ifdef NEEDS_LIBRT
 	EXTLIBS += -lrt
 endif
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 6ae18401e04..4deee1af46e 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -3,14 +3,20 @@
  */
 #include "cache.h"
 #include "bulk-checkin.h"
+#include "lockfile.h"
 #include "repository.h"
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "string-list.h"
+#include "tmp-objdir.h"
 #include "packfile.h"
 #include "object-store.h"
 
 static int bulk_checkin_plugged;
+static int needs_batch_fsync;
+
+static struct tmp_objdir *bulk_fsync_objdir;
 
 static struct bulk_checkin_state {
 	char *pack_tmp_name;
@@ -79,6 +85,34 @@ clear_exit:
 	reprepare_packed_git(the_repository);
 }
 
+/*
+ * Cleanup after batch-mode fsync_object_files.
+ */
+static void do_batch_fsync(void)
+{
+	/*
+	 * Issue a full hardware flush against a temporary file to ensure
+	 * that all objects are durable before any renames occur.  The code in
+	 * fsync_loose_object_bulk_checkin has already issued a writeout
+	 * request, but it has not flushed any writeback cache in the storage
+	 * hardware.
+	 */
+
+	if (needs_batch_fsync) {
+		struct strbuf temp_path = STRBUF_INIT;
+		struct tempfile *temp;
+
+		strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory());
+		temp = xmks_tempfile(temp_path.buf);
+		fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp));
+		delete_tempfile(&temp);
+		strbuf_release(&temp_path);
+	}
+
+	if (bulk_fsync_objdir)
+		tmp_objdir_migrate(bulk_fsync_objdir);
+}
+
 static int already_written(struct bulk_checkin_state *state, struct object_id *oid)
 {
 	int i;
@@ -273,6 +307,25 @@ static int deflate_to_pack(struct bulk_checkin_state *state,
 	return 0;
 }
 
+void fsync_loose_object_bulk_checkin(int fd)
+{
+	assert(fsync_object_files == FSYNC_OBJECT_FILES_BATCH);
+
+	/*
+	 * If we have a plugged bulk checkin, we issue a call that
+	 * cleans the filesystem page cache but avoids a hardware flush
+	 * command. Later on we will issue a single hardware flush
+	 * before as part of do_batch_fsync.
+	 */
+	if (bulk_checkin_plugged &&
+	    git_fsync(fd, FSYNC_WRITEOUT_ONLY) >= 0) {
+		if (!needs_batch_fsync)
+			needs_batch_fsync = 1;
+	} else {
+		fsync_or_die(fd, "loose object file");
+	}
+}
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags)
@@ -287,6 +340,19 @@ int index_bulk_checkin(struct object_id *oid,
 void plug_bulk_checkin(void)
 {
 	assert(!bulk_checkin_plugged);
+
+	/*
+	 * A temporary object directory is used to hold the files
+	 * while they are not fsynced.
+	 */
+	if (fsync_object_files == FSYNC_OBJECT_FILES_BATCH) {
+		bulk_fsync_objdir = tmp_objdir_create("bulk-fsync");
+		if (!bulk_fsync_objdir)
+			die(_("Could not create temporary object directory for core.fsyncobjectfiles=batch"));
+
+		tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0);
+	}
+
 	bulk_checkin_plugged = 1;
 }
 
@@ -296,4 +362,6 @@ void unplug_bulk_checkin(void)
 	bulk_checkin_plugged = 0;
 	if (bulk_checkin_state.f)
 		finish_bulk_checkin(&bulk_checkin_state);
+
+	do_batch_fsync();
 }
diff --git a/bulk-checkin.h b/bulk-checkin.h
index b26f3dc3b74..08f292379b6 100644
--- a/bulk-checkin.h
+++ b/bulk-checkin.h
@@ -6,6 +6,8 @@
 
 #include "cache.h"
 
+void fsync_loose_object_bulk_checkin(int fd);
+
 int index_bulk_checkin(struct object_id *oid,
 		       int fd, size_t size, enum object_type type,
 		       const char *path, unsigned flags);
diff --git a/cache.h b/cache.h
index eba12487b99..6d6e6770ecc 100644
--- a/cache.h
+++ b/cache.h
@@ -985,7 +985,13 @@ void reset_shared_repository(void);
 extern int read_replace_refs;
 extern char *git_replace_ref_base;
 
-extern int fsync_object_files;
+enum fsync_object_files_mode {
+    FSYNC_OBJECT_FILES_OFF,
+    FSYNC_OBJECT_FILES_ON,
+    FSYNC_OBJECT_FILES_BATCH
+};
+
+extern enum fsync_object_files_mode fsync_object_files;
 extern int core_preload_index;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index c5873f3a706..5eb36ecd77a 100644
--- a/config.c
+++ b/config.c
@@ -1491,7 +1491,12 @@ static int git_default_core_config(const char *var, const char *value, void *cb)
 	}
 
 	if (!strcmp(var, "core.fsyncobjectfiles")) {
-		fsync_object_files = git_config_bool(var, value);
+		if (value && !strcmp(value, "batch"))
+			fsync_object_files = FSYNC_OBJECT_FILES_BATCH;
+		else if (git_config_bool(var, value))
+			fsync_object_files = FSYNC_OBJECT_FILES_ON;
+		else
+			fsync_object_files = FSYNC_OBJECT_FILES_OFF;
 		return 0;
 	}
 
diff --git a/config.mak.uname b/config.mak.uname
index 3236a4918a3..5ead1377667 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -57,6 +57,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	# -lrt is needed for clock_gettime on glibc <= 2.16
 	NEEDS_LIBRT = YesPlease
+	HAVE_SYNC_FILE_RANGE = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
 	FREAD_READS_DIRECTORIES = UnfortunatelyYes
diff --git a/configure.ac b/configure.ac
index 031e8d3fee8..c711037d625 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1090,6 +1090,14 @@ AC_COMPILE_IFELSE([CLOCK_MONOTONIC_SRC],
 	[AC_MSG_RESULT([no])
 	HAVE_CLOCK_MONOTONIC=])
 GIT_CONF_SUBST([HAVE_CLOCK_MONOTONIC])
+
+#
+# Define HAVE_SYNC_FILE_RANGE=YesPlease if sync_file_range is available.
+GIT_CHECK_FUNC(sync_file_range,
+	[HAVE_SYNC_FILE_RANGE=YesPlease],
+	[HAVE_SYNC_FILE_RANGE])
+GIT_CONF_SUBST([HAVE_SYNC_FILE_RANGE])
+
 #
 # Define NO_SETITIMER if you don't have setitimer.
 GIT_CHECK_FUNC(setitimer,
diff --git a/environment.c b/environment.c
index 2701dfeeec8..aeafe80235e 100644
--- a/environment.c
+++ b/environment.c
@@ -42,7 +42,7 @@ const char *git_attributes_file;
 const char *git_hooks_path;
 int zlib_compression_level = Z_BEST_SPEED;
 int pack_compression_level = Z_DEFAULT_COMPRESSION;
-int fsync_object_files;
+enum fsync_object_files_mode fsync_object_files;
 size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
 size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
 size_t delta_base_cache_limit = 96 * 1024 * 1024;
diff --git a/git-compat-util.h b/git-compat-util.h
index d70ce142861..4defd4ab200 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1214,6 +1214,13 @@ __attribute__((format (printf, 1, 2))) NORETURN
 void BUG(const char *fmt, ...);
 #endif
 
+enum fsync_action {
+    FSYNC_WRITEOUT_ONLY,
+    FSYNC_HARDWARE_FLUSH
+};
+
+int git_fsync(int fd, enum fsync_action action);
+
 /*
  * Preserves errno, prints a message, but gives no warning for ENOENT.
  * Returns 0 on success, which includes trying to unlink an object that does
diff --git a/object-file.c b/object-file.c
index 659ef7623ff..9d0aac792ae 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1853,8 +1853,18 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
 static void close_loose_object(int fd)
 {
 	if (!the_repository->objects->odb->will_destroy) {
-		if (fsync_object_files)
+		switch (fsync_object_files) {
+		case FSYNC_OBJECT_FILES_OFF:
+			break;
+		case FSYNC_OBJECT_FILES_ON:
 			fsync_or_die(fd, "loose object file");
+			break;
+		case FSYNC_OBJECT_FILES_BATCH:
+			fsync_loose_object_bulk_checkin(fd);
+			break;
+		default:
+			BUG("Invalid fsync_object_files mode.");
+		}
 	}
 
 	if (close(fd) != 0)
diff --git a/wrapper.c b/wrapper.c
index 36e12119d76..689288d2e31 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -546,6 +546,50 @@ int xmkstemp_mode(char *filename_template, int mode)
 	return fd;
 }
 
+int git_fsync(int fd, enum fsync_action action)
+{
+	switch (action) {
+	case FSYNC_WRITEOUT_ONLY:
+
+#ifdef __APPLE__
+		/*
+		 * on macOS, fsync just causes filesystem cache writeback but does not
+		 * flush hardware caches.
+		 */
+		return fsync(fd);
+#endif
+
+#ifdef HAVE_SYNC_FILE_RANGE
+		/*
+		 * On linux 2.6.17 and above, sync_file_range is the way to issue
+		 * a writeback without a hardware flush. An offset of 0 and size of 0
+		 * indicates writeout of the entire file and the wait flags ensure that all
+		 * dirty data is written to the disk (potentially in a disk-side cache)
+		 * before we continue.
+		 */
+
+		return sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE |
+						 SYNC_FILE_RANGE_WRITE |
+						 SYNC_FILE_RANGE_WAIT_AFTER);
+#endif
+
+		errno = ENOSYS;
+		return -1;
+
+	case FSYNC_HARDWARE_FLUSH:
+
+#ifdef __APPLE__
+		return fcntl(fd, F_FULLFSYNC);
+#else
+		return fsync(fd);
+#endif
+
+	default:
+		BUG("unexpected git_fsync(%d) call", action);
+	}
+
+}
+
 static int warn_if_unremovable(const char *op, const char *file, int rc)
 {
 	int err;
diff --git a/write-or-die.c b/write-or-die.c
index 0b1ec8190b6..cc8291d9794 100644
--- a/write-or-die.c
+++ b/write-or-die.c
@@ -57,7 +57,7 @@ void fprintf_or_die(FILE *f, const char *fmt, ...)
 
 void fsync_or_die(int fd, const char *msg)
 {
-	while (fsync(fd) < 0) {
+	while (git_fsync(fd, FSYNC_HARDWARE_FLUSH) < 0) {
 		if (errno != EINTR)
 			die_errno("fsync error on '%s'", msg);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (3 preceding siblings ...)
  2021-11-15 23:50                 ` [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
@ 2021-11-15 23:50                 ` Neeraj Singh via GitGitGadget
  2021-11-15 23:51                 ` [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
                                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:50 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

This commit adds a win32 implementation for fsync_no_flush that is
called git_fsync. The 'NtFlushBuffersFileEx' function being called is
available since Windows 8. If the function is not available, we
return -1 and Git falls back to doing a full fsync.

The operating system is told to flush data only without a hardware
flush primitive. A later full fsync will cause the metadata log
to be flushed and then the disk cache to be flushed on NTFS and
ReFS. Other filesystems will treat this as a full flush operation.

I added a new file here for this system call so as not to conflict with
downstream changes in the git-for-windows repository related to fscache.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 compat/mingw.h                      |  3 +++
 compat/win32/flush.c                | 28 ++++++++++++++++++++++++++++
 config.mak.uname                    |  2 ++
 contrib/buildsystems/CMakeLists.txt |  3 ++-
 wrapper.c                           |  4 ++++
 5 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 compat/win32/flush.c

diff --git a/compat/mingw.h b/compat/mingw.h
index c9a52ad64a6..6074a3d3ced 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -329,6 +329,9 @@ int mingw_getpagesize(void);
 #define getpagesize mingw_getpagesize
 #endif
 
+int win32_fsync_no_flush(int fd);
+#define fsync_no_flush win32_fsync_no_flush
+
 struct rlimit {
 	unsigned int rlim_cur;
 };
diff --git a/compat/win32/flush.c b/compat/win32/flush.c
new file mode 100644
index 00000000000..75324c24ee7
--- /dev/null
+++ b/compat/win32/flush.c
@@ -0,0 +1,28 @@
+#include "../../git-compat-util.h"
+#include <winternl.h>
+#include "lazyload.h"
+
+int win32_fsync_no_flush(int fd)
+{
+       IO_STATUS_BLOCK io_status;
+
+#define FLUSH_FLAGS_FILE_DATA_ONLY 1
+
+       DECLARE_PROC_ADDR(ntdll.dll, NTSTATUS, NtFlushBuffersFileEx,
+			 HANDLE FileHandle, ULONG Flags, PVOID Parameters, ULONG ParameterSize,
+			 PIO_STATUS_BLOCK IoStatusBlock);
+
+       if (!INIT_PROC_ADDR(NtFlushBuffersFileEx)) {
+		errno = ENOSYS;
+		return -1;
+       }
+
+       memset(&io_status, 0, sizeof(io_status));
+       if (NtFlushBuffersFileEx((HANDLE)_get_osfhandle(fd), FLUSH_FLAGS_FILE_DATA_ONLY,
+				NULL, 0, &io_status)) {
+		errno = EINVAL;
+		return -1;
+       }
+
+       return 0;
+}
diff --git a/config.mak.uname b/config.mak.uname
index 5ead1377667..5727fb093ca 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -455,6 +455,7 @@ endif
 	CFLAGS =
 	BASIC_CFLAGS = -nologo -I. -Icompat/vcbuild/include -DWIN32 -D_CONSOLE -DHAVE_STRING_H -D_CRT_SECURE_NO_WARNINGS -D_CRT_NONSTDC_NO_DEPRECATE
 	COMPAT_OBJS = compat/msvc.o compat/winansi.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/trace2_win32_process_info.o \
@@ -630,6 +631,7 @@ ifeq ($(uname_S),MINGW)
 	COMPAT_CFLAGS += -DSTRIP_EXTENSION=\".exe\"
 	COMPAT_OBJS += compat/mingw.o compat/winansi.o \
 		compat/win32/trace2_win32_process_info.o \
+		compat/win32/flush.o \
 		compat/win32/path-utils.o \
 		compat/win32/pthread.o compat/win32/syslog.o \
 		compat/win32/dirent.o
diff --git a/contrib/buildsystems/CMakeLists.txt b/contrib/buildsystems/CMakeLists.txt
index fd1399c440f..ef0c1e4976d 100644
--- a/contrib/buildsystems/CMakeLists.txt
+++ b/contrib/buildsystems/CMakeLists.txt
@@ -261,7 +261,8 @@ if(CMAKE_SYSTEM_NAME STREQUAL "Windows")
 				NOGDI OBJECT_CREATION_MODE=1 __USE_MINGW_ANSI_STDIO=0
 				USE_NED_ALLOCATOR OVERRIDE_STRDUP MMAP_PREVENTS_DELETE USE_WIN32_MMAP
 				UNICODE _UNICODE HAVE_WPGMPTR ENSURE_MSYSTEM_IS_SET)
-	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c compat/win32/path-utils.c
+	list(APPEND compat_SOURCES compat/mingw.c compat/winansi.c
+		compat/win32/flush.c compat/win32/path-utils.c
 		compat/win32/pthread.c compat/win32mmap.c compat/win32/syslog.c
 		compat/win32/trace2_win32_process_info.c compat/win32/dirent.c
 		compat/nedmalloc/nedmalloc.c compat/strdup.c)
diff --git a/wrapper.c b/wrapper.c
index 689288d2e31..ece3d2ca106 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -573,6 +573,10 @@ int git_fsync(int fd, enum fsync_action action)
 						 SYNC_FILE_RANGE_WAIT_AFTER);
 #endif
 
+#ifdef fsync_no_flush
+		return fsync_no_flush(fd);
+#endif
+
 		errno = ENOSYS;
 		return -1;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (4 preceding siblings ...)
  2021-11-15 23:50                 ` [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
@ 2021-11-15 23:51                 ` Neeraj Singh via GitGitGadget
  2021-11-15 23:51                 ` [PATCH v9 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
                                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The update-index functionality is used internally by 'git stash push' to
setup the internal stashed commit.

This change enables bulk-checkin for update-index infrastructure to
speed up adding new objects to the object database by leveraging the
pack functionality and the new bulk-fsync functionality.

There is some risk with this change, since under batch fsync, the object
files will not be available until the update-index is entirely complete.
This usage is unlikely, since any tool invoking update-index and
expecting to see objects would have to synchronize with the update-index
process after passing it a file path.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/update-index.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 187203e8bb5..dc7368bb1ee 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -5,6 +5,7 @@
  */
 #define USE_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "lockfile.h"
 #include "quote.h"
@@ -1088,6 +1089,9 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 	the_index.updated_skipworktree = 1;
 
+	/* we might be adding many objects to the object database */
+	plug_bulk_checkin();
+
 	/*
 	 * Custom copy of parse_options() because we want to handle
 	 * filename arguments as they come.
@@ -1168,6 +1172,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		strbuf_release(&buf);
 	}
 
+	/* by now we must have added all of the new objects */
+	unplug_bulk_checkin();
 	if (split_index > 0) {
 		if (git_config_get_split_index() == 0)
 			warning(_("core.splitIndex is set to false; "
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 7/9] unpack-objects: use the bulk-checkin infrastructure
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (5 preceding siblings ...)
  2021-11-15 23:51                 ` [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
@ 2021-11-15 23:51                 ` Neeraj Singh via GitGitGadget
  2021-11-15 23:51                 ` [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
                                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

The unpack-objects functionality is used by fetch, push, and fast-import
to turn the transfered data into object database entries when there are
fewer objects than the 'unpacklimit' setting.

By enabling bulk-checkin when unpacking objects, we can take advantage
of batched fsyncs.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 builtin/unpack-objects.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 4a9466295ba..51eb4f7b531 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "bulk-checkin.h"
 #include "config.h"
 #include "object-store.h"
 #include "object.h"
@@ -503,10 +504,12 @@ static void unpack_all(void)
 	if (!quiet)
 		progress = start_progress(_("Unpacking objects"), nr_objects);
 	CALLOC_ARRAY(obj_list, nr_objects);
+	plug_bulk_checkin();
 	for (i = 0; i < nr_objects; i++) {
 		unpack_one(i);
 		display_progress(progress, i + 1);
 	}
+	unplug_bulk_checkin();
 	stop_progress(&progress);
 
 	if (delta_list)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (6 preceding siblings ...)
  2021-11-15 23:51                 ` [PATCH v9 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
@ 2021-11-15 23:51                 ` Neeraj Singh via GitGitGadget
  2021-11-15 23:51                 ` [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
  2021-11-16  8:02                 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add test cases to exercise batch mode for:
 * 'git add'
 * 'git stash'
 * 'git update-index'
 * 'git unpack-objects'

These tests ensure that the added data winds up in the object database.

In this change we introduce a new test helper lib-unique-files.sh. The
goal of this library is to create a tree of files that have different
oids from any other files that may have been created in the current test
repo. This helps us avoid missing validation of an object being added due
to it already being in the repo.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/lib-unique-files.sh  | 36 ++++++++++++++++++++++++++++++++++++
 t/t3700-add.sh         | 20 ++++++++++++++++++++
 t/t3903-stash.sh       | 14 ++++++++++++++
 t/t5300-pack-object.sh | 30 +++++++++++++++++++-----------
 4 files changed, 89 insertions(+), 11 deletions(-)
 create mode 100644 t/lib-unique-files.sh

diff --git a/t/lib-unique-files.sh b/t/lib-unique-files.sh
new file mode 100644
index 00000000000..a7de4ca8512
--- /dev/null
+++ b/t/lib-unique-files.sh
@@ -0,0 +1,36 @@
+# Helper to create files with unique contents
+
+
+# Create multiple files with unique contents. Takes the number of
+# directories, the number of files in each directory, and the base
+# directory.
+#
+# test_create_unique_files 2 3 my_dir -- Creates 2 directories with 3 files
+#					 each in my_dir, all with unique
+#					 contents.
+
+test_create_unique_files() {
+	test "$#" -ne 3 && BUG "3 param"
+
+	local dirs=$1
+	local files=$2
+	local basedir=$3
+	local counter=0
+	test_tick
+	local basedata=$test_tick
+
+
+	rm -rf $basedir
+
+	for i in $(test_seq $dirs)
+	do
+		local dir=$basedir/dir$i
+
+		mkdir -p "$dir"
+		for j in $(test_seq $files)
+		do
+			counter=$((counter + 1))
+			echo "$basedata.$counter"  >"$dir/file$j.txt"
+		done
+	done
+}
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index 283a66955d6..aaecefda159 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -8,6 +8,8 @@ test_description='Test of git add, including the -- option.'
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
+. $TEST_DIRECTORY/lib-unique-files.sh
+
 # Test the file mode "$1" of the file "$2" in the index.
 test_mode_in_index () {
 	case "$(git ls-files -s "$2")" in
@@ -34,6 +36,24 @@ test_expect_success \
     'Test that "git add -- -q" works' \
     'touch -- -q && git add -- -q'
 
+test_expect_success 'git add: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch add -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+	git ls-files --stage fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$2}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+test_expect_success 'git update-index: core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files2 &&
+	find fsync-files2 ! -type d -print | xargs git -c core.fsyncobjectfiles=batch update-index --add -- &&
+	rm -f fsynced_files2 &&
+	git ls-files --stage fsync-files2/ > fsynced_files2 &&
+	test_line_count = 8 fsynced_files2 &&
+	awk -- '{print \$2}' fsynced_files2 | xargs -n1 git cat-file -e
+"
+
 test_expect_success \
 	'git add: Test that executable bit is not used if core.filemode=0' \
 	'git config core.filemode 0 &&
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index f0a82be9de7..6324b52c874 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -9,6 +9,7 @@ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
 export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
+. $TEST_DIRECTORY/lib-unique-files.sh
 
 diff_cmp () {
 	for i in "$1" "$2"
@@ -1293,6 +1294,19 @@ test_expect_success 'stash handles skip-worktree entries nicely' '
 	git rev-parse --verify refs/stash:A.t
 '
 
+test_expect_success 'stash with core.fsyncobjectfiles=batch' "
+	test_create_unique_files 2 4 fsync-files &&
+	git -c core.fsyncobjectfiles=batch stash push -u -- ./fsync-files/ &&
+	rm -f fsynced_files &&
+
+	# The files were untracked, so use the third parent,
+	# which contains the untracked files
+	git ls-tree -r stash^3 -- ./fsync-files/ > fsynced_files &&
+	test_line_count = 8 fsynced_files &&
+	awk -- '{print \$3}' fsynced_files | xargs -n1 git cat-file -e
+"
+
+
 test_expect_success 'stash -c stash.useBuiltin=false warning ' '
 	expected="stash.useBuiltin support has been removed" &&
 
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index e13a8842075..38663dc1393 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -162,23 +162,23 @@ test_expect_success 'pack-objects with bogus arguments' '
 
 check_unpack () {
 	test_when_finished "rm -rf git2" &&
-	git init --bare git2 &&
-	git -C git2 unpack-objects -n <"$1".pack &&
-	git -C git2 unpack-objects <"$1".pack &&
-	(cd .git && find objects -type f -print) |
-	while read path
-	do
-		cmp git2/$path .git/$path || {
-			echo $path differs.
-			return 1
-		}
-	done
+	git $2 init --bare git2 &&
+	(
+		git $2 -C git2 unpack-objects -n <"$1".pack &&
+		git $2 -C git2 unpack-objects <"$1".pack &&
+		git $2 -C git2 cat-file --batch-check="%(objectname)"
+	) <obj-list >current &&
+	cmp obj-list current
 }
 
 test_expect_success 'unpack without delta' '
 	check_unpack test-1-${packname_1}
 '
 
+test_expect_success 'unpack without delta (core.fsyncobjectfiles=batch)' '
+	check_unpack test-1-${packname_1} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with REF_DELTA' '
 	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
 	check_deltas stderr -gt 0
@@ -188,6 +188,10 @@ test_expect_success 'unpack with REF_DELTA' '
 	check_unpack test-2-${packname_2}
 '
 
+test_expect_success 'unpack with REF_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-2-${packname_2} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'pack with OFS_DELTA' '
 	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
 			<obj-list 2>stderr) &&
@@ -198,6 +202,10 @@ test_expect_success 'unpack with OFS_DELTA' '
 	check_unpack test-3-${packname_3}
 '
 
+test_expect_success 'unpack with OFS_DELTA (core.fsyncobjectfiles=batch)' '
+       check_unpack test-3-${packname_3} "-c core.fsyncobjectfiles=batch"
+'
+
 test_expect_success 'compare delta flavors' '
 	perl -e '\''
 		defined($_ = -s $_) or die for @ARGV;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 160+ messages in thread

* [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (7 preceding siblings ...)
  2021-11-15 23:51                 ` [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
@ 2021-11-15 23:51                 ` Neeraj Singh via GitGitGadget
  2021-11-16  8:02                 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
  9 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh via GitGitGadget @ 2021-11-15 23:51 UTC (permalink / raw)
  To: git
  Cc: Neeraj-Personal, Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Neeraj Singh

From: Neeraj Singh <neerajsi@microsoft.com>

Add a basic performance test for "git add" and "git stash" of a lot of
new objects with various fsync settings.

Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
---
 t/perf/p3700-add.sh   | 43 ++++++++++++++++++++++++++++++++++++++++
 t/perf/p3900-stash.sh | 46 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+)
 create mode 100755 t/perf/p3700-add.sh
 create mode 100755 t/perf/p3900-stash.sh

diff --git a/t/perf/p3700-add.sh b/t/perf/p3700-add.sh
new file mode 100755
index 00000000000..e93c08a2e70
--- /dev/null
+++ b/t/perf/p3700-add.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of add"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	test_perf "add $total_files files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m add files
+	"
+done
+
+test_done
diff --git a/t/perf/p3900-stash.sh b/t/perf/p3900-stash.sh
new file mode 100755
index 00000000000..c9fcd0c03eb
--- /dev/null
+++ b/t/perf/p3900-stash.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+#
+# This test measures the performance of adding new files to the object database
+# and index. The test was originally added to measure the effect of the
+# core.fsyncObjectFiles=batch mode, which is why we are testing different values
+# of that setting explicitly and creating a lot of unique objects.
+
+test_description="Tests performance of stash"
+
+. ./perf-lib.sh
+
+. $TEST_DIRECTORY/lib-unique-files.sh
+
+test_perf_default_repo
+test_checkout_worktree
+
+dir_count=10
+files_per_dir=50
+total_files=$((dir_count * files_per_dir))
+
+# We need to create the files each time we run the perf test, but
+# we do not want to measure the cost of creating the files, so run
+# the tet once.
+if test "${GIT_PERF_REPEAT_COUNT-1}" -ne 1
+then
+	echo "warning: Setting GIT_PERF_REPEAT_COUNT=1" >&2
+	GIT_PERF_REPEAT_COUNT=1
+fi
+
+for m in false true batch
+do
+	test_expect_success "create the files for core.fsyncObjectFiles=$m" '
+		git reset --hard &&
+		# create files across directories
+		test_create_unique_files $dir_count $files_per_dir files
+	'
+
+	# We only stash files in the 'files' subdirectory since
+	# the perf test infrastructure creates files in the
+	# current working directory that need to be preserved
+	test_perf "stash 500 files (core.fsyncObjectFiles=$m)" "
+		git -c core.fsyncobjectfiles=$m stash push -u -- files
+	"
+done
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb
  2021-11-15 23:50                 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
@ 2021-11-16  7:23                   ` Ævar Arnfjörð Bjarmason
  2021-11-16 20:38                     ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-16  7:23 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj Singh


On Mon, Nov 15 2021, Neeraj Singh via GitGitGadget wrote:

>  	/*
>  	 * This object store is ephemeral, so there is no need to fsync.
>  	 */
> -	int will_destroy;
> +	unsigned int will_destroy : 1;
>  
>  	/*
>  	 * Path to the alternative object store. If this is a relative path,

Why add this as an int in the preceding commit and turn it "unsigned :
1" here?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
                                   ` (8 preceding siblings ...)
  2021-11-15 23:51                 ` [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
@ 2021-11-16  8:02                 ` Ævar Arnfjörð Bjarmason
  2021-11-17  7:06                   ` Neeraj Singh
  9 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-16  8:02 UTC (permalink / raw)
  To: Neeraj K. Singh via GitGitGadget
  Cc: git, Neeraj-Personal, Johannes Schindelin, Jeff King,
	Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh


On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:

>  * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
>    'true', 'false', and 'batch'. This makes using old and new versions of
>    git with 'batch' mode a little trickier, but hopefully people will
>    generally be moving forward in versions.
>
> [1] See
> https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
> [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/

I really think leaving that in-place is just being unnecessarily
cavalier. There's a lot of mixed-version environments where git is
deployed in, and we almost never break the configuration in this way (I
think in the past always by mistake).

In this case it's easy to avoid it, and coming up with a less narrow
config model[1] seems like a good idea in any case to unify the various
outstanding work in this area.

More generally on this series, per the thread ending in [2] I really
don't get why we have code like this:
	
	@@ -503,10 +504,12 @@ static void unpack_all(void)
	 	if (!quiet)
	 		progress = start_progress(_("Unpacking objects"), nr_objects);
	 	CALLOC_ARRAY(obj_list, nr_objects);
	+	plug_bulk_checkin();
	 	for (i = 0; i < nr_objects; i++) {
	 		unpack_one(i);
	 		display_progress(progress, i + 1);
	 	}
	+	unplug_bulk_checkin();
	 	stop_progress(&progress);
	 
	 	if (delta_list)

As opposed to doing an fsync on the last object we're
processing. I.e. why do we need the step of intentionally making the
objects unavailable in the tmp-objdir, and creating a "cookie" file to
sync at the start/end, as opposed to fsyncing on the last file (which
we're writing out anyway).

1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb
  2021-11-16  7:23                   ` Ævar Arnfjörð Bjarmason
@ 2021-11-16 20:38                     ` Neeraj Singh
  0 siblings, 0 replies; 160+ messages in thread
From: Neeraj Singh @ 2021-11-16 20:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj Singh

On Mon, Nov 15, 2021 at 11:24 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Nov 15 2021, Neeraj Singh via GitGitGadget wrote:
>
> >       /*
> >        * This object store is ephemeral, so there is no need to fsync.
> >        */
> > -     int will_destroy;
> > +     unsigned int will_destroy : 1;
> >
> >       /*
> >        * Path to the alternative object store. If this is a relative path,
>
> Why add this as an int in the preceding commit and turn it "unsigned :
> 1" here?

Good catch.  I'll fix this. This was an artifact of a previous version
where there was another variable here as well.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-11-16  8:02                 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
@ 2021-11-17  7:06                   ` Neeraj Singh
  2021-11-17  7:24                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-11-17  7:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong

On Tue, Nov 16, 2021 at 12:10 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:
>
> >  * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
> >    'true', 'false', and 'batch'. This makes using old and new versions of
> >    git with 'batch' mode a little trickier, but hopefully people will
> >    generally be moving forward in versions.
> >
> > [1] See
> > https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
> > [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/
>
> I really think leaving that in-place is just being unnecessarily
> cavalier. There's a lot of mixed-version environments where git is
> deployed in, and we almost never break the configuration in this way (I
> think in the past always by mistake).

> In this case it's easy to avoid it, and coming up with a less narrow
> config model[1] seems like a good idea in any case to unify the various
> outstanding work in this area.
>
> More generally on this series, per the thread ending in [2] I really

My primary goal in all of these changes is to move git-for-windows over to
a default of batch fsync so that it can get closer to other platforms
in performance
of 'git add' while still retaining the same level of data integrity.
I'm hoping that
most end-users are just sticking to defaults here.

I'm happy to change the configuration schema again if there's a
consensus from the Git
community that backwards-compatibility of the configuration is
actually important to someone.

Also, if we're doing a deeper rethink of the fsync configuration (as
prompted by this work and
Eric Wong's and Patrick Steinhardts work), do we want to retain a mode
where we fsync some
parts of the persistent repo data but not others?  If we add fsyncing
of the index in addition to the refs,
I believe we would have covered all of the critical data structures
that would be needed to find the
data that a user has added to the repo if they complete a series of
git commands and then experience
a system crash.

> don't get why we have code like this:
>
>         @@ -503,10 +504,12 @@ static void unpack_all(void)
>                 if (!quiet)
>                         progress = start_progress(_("Unpacking objects"), nr_objects);
>                 CALLOC_ARRAY(obj_list, nr_objects);
>         +       plug_bulk_checkin();
>                 for (i = 0; i < nr_objects; i++) {
>                         unpack_one(i);
>                         display_progress(progress, i + 1);
>                 }
>         +       unplug_bulk_checkin();
>                 stop_progress(&progress);
>
>                 if (delta_list)
>
> As opposed to doing an fsync on the last object we're
> processing. I.e. why do we need the step of intentionally making the
> objects unavailable in the tmp-objdir, and creating a "cookie" file to
> sync at the start/end, as opposed to fsyncing on the last file (which
> we're writing out anyway).
>
> 1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/

It's important to not expose an object's final name until its contents
have been fsynced
to disk. We want to ensure that wherever we crash, we won't have a
loose object that
Git may later try to open where the filename doesn't match the content
hash. I believe it's
okay for a given OID to be missing, since a later command could
recreate it, but an object
with a wrong hash looks like it would persist until we do a git-fsck.

I thought about figuring out how to sync the last object rather than some random
"cookie" file, but it wasn't clear to me how I'd figure out which
object is actually last
from library code in a way that doesn't burden each command with
somehow figuring
out its last object and communicating that. The 'cookie' approach
seems to lead to a cleaner
interface for callers.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-11-17  7:06                   ` Neeraj Singh
@ 2021-11-17  7:24                     ` Ævar Arnfjörð Bjarmason
  2021-11-18  5:03                       ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-11-17  7:24 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong


On Tue, Nov 16 2021, Neeraj Singh wrote:

> On Tue, Nov 16, 2021 at 12:10 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:
>>
>> >  * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
>> >    'true', 'false', and 'batch'. This makes using old and new versions of
>> >    git with 'batch' mode a little trickier, but hopefully people will
>> >    generally be moving forward in versions.
>> >
>> > [1] See
>> > https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
>> > [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/
>>
>> I really think leaving that in-place is just being unnecessarily
>> cavalier. There's a lot of mixed-version environments where git is
>> deployed in, and we almost never break the configuration in this way (I
>> think in the past always by mistake).
>
>> In this case it's easy to avoid it, and coming up with a less narrow
>> config model[1] seems like a good idea in any case to unify the various
>> outstanding work in this area.
>>
>> More generally on this series, per the thread ending in [2] I really
>
> My primary goal in all of these changes is to move git-for-windows over to
> a default of batch fsync so that it can get closer to other platforms
> in performance
> of 'git add' while still retaining the same level of data integrity.
> I'm hoping that
> most end-users are just sticking to defaults here.
>
> I'm happy to change the configuration schema again if there's a
> consensus from the Git
> community that backwards-compatibility of the configuration is
> actually important to someone.
>
> Also, if we're doing a deeper rethink of the fsync configuration (as
> prompted by this work and
> Eric Wong's and Patrick Steinhardts work), do we want to retain a mode
> where we fsync some
> parts of the persistent repo data but not others?  If we add fsyncing
> of the index in addition to the refs,
> I believe we would have covered all of the critical data structures
> that would be needed to find the
> data that a user has added to the repo if they complete a series of
> git commands and then experience
> a system crash.

Just talking about it is how we'll find consensus, maybe you & Junio
would like to keep it as-is. I don't see why we'd expose this bad edge
case in configuration handling to users when it's entirely avoidable,
and we're still in the design phase.

>> don't get why we have code like this:
>>
>>         @@ -503,10 +504,12 @@ static void unpack_all(void)
>>                 if (!quiet)
>>                         progress = start_progress(_("Unpacking objects"), nr_objects);
>>                 CALLOC_ARRAY(obj_list, nr_objects);
>>         +       plug_bulk_checkin();
>>                 for (i = 0; i < nr_objects; i++) {
>>                         unpack_one(i);
>>                         display_progress(progress, i + 1);
>>                 }
>>         +       unplug_bulk_checkin();
>>                 stop_progress(&progress);
>>
>>                 if (delta_list)
>>
>> As opposed to doing an fsync on the last object we're
>> processing. I.e. why do we need the step of intentionally making the
>> objects unavailable in the tmp-objdir, and creating a "cookie" file to
>> sync at the start/end, as opposed to fsyncing on the last file (which
>> we're writing out anyway).
>>
>> 1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
>> 2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/
>
> It's important to not expose an object's final name until its contents
> have been fsynced
> to disk. We want to ensure that wherever we crash, we won't have a
> loose object that
> Git may later try to open where the filename doesn't match the content
> hash. I believe it's
> okay for a given OID to be missing, since a later command could
> recreate it, but an object
> with a wrong hash looks like it would persist until we do a git-fsck.

Yes, we handle that rather badly, as I mentioned in some other threads,
but not doing the fsync on the last object v.s. a "cookie" file right
afterwards seems like a hail-mary at best, no?

> I thought about figuring out how to sync the last object rather than some random
> "cookie" file, but it wasn't clear to me how I'd figure out which
> object is actually last
> from library code in a way that doesn't burden each command with
> somehow figuring
> out its last object and communicating that. The 'cookie' approach
> seems to lead to a cleaner
> interface for callers.

The above quoted code is looping through nr_objects isn't it? Can't a
"do fsync" be passed down to unpack_one() when we process the last loose
object?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-11-17  7:24                     ` Ævar Arnfjörð Bjarmason
@ 2021-11-18  5:03                       ` Neeraj Singh
  2021-12-01 14:15                         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-11-18  5:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong

On Tue, Nov 16, 2021 at 11:28 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Nov 16 2021, Neeraj Singh wrote:
>
> > On Tue, Nov 16, 2021 at 12:10 AM Ævar Arnfjörð Bjarmason
> > <avarab@gmail.com> wrote:
> >>
> >>
> >> On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:
> >>
> >> >  * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
> >> >    'true', 'false', and 'batch'. This makes using old and new versions of
> >> >    git with 'batch' mode a little trickier, but hopefully people will
> >> >    generally be moving forward in versions.
> >> >
> >> > [1] See
> >> > https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
> >> > [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/
> >>
> >> I really think leaving that in-place is just being unnecessarily
> >> cavalier. There's a lot of mixed-version environments where git is
> >> deployed in, and we almost never break the configuration in this way (I
> >> think in the past always by mistake).
> >
> >> In this case it's easy to avoid it, and coming up with a less narrow
> >> config model[1] seems like a good idea in any case to unify the various
> >> outstanding work in this area.
> >>
> >> More generally on this series, per the thread ending in [2] I really
> >
> > My primary goal in all of these changes is to move git-for-windows over to
> > a default of batch fsync so that it can get closer to other platforms
> > in performance
> > of 'git add' while still retaining the same level of data integrity.
> > I'm hoping that
> > most end-users are just sticking to defaults here.
> >
> > I'm happy to change the configuration schema again if there's a
> > consensus from the Git
> > community that backwards-compatibility of the configuration is
> > actually important to someone.
> >
> > Also, if we're doing a deeper rethink of the fsync configuration (as
> > prompted by this work and
> > Eric Wong's and Patrick Steinhardts work), do we want to retain a mode
> > where we fsync some
> > parts of the persistent repo data but not others?  If we add fsyncing
> > of the index in addition to the refs,
> > I believe we would have covered all of the critical data structures
> > that would be needed to find the
> > data that a user has added to the repo if they complete a series of
> > git commands and then experience
> > a system crash.
>
> Just talking about it is how we'll find consensus, maybe you & Junio
> would like to keep it as-is. I don't see why we'd expose this bad edge
> case in configuration handling to users when it's entirely avoidable,
> and we're still in the design phase.

After trying to figure out an implementation, I have a new proposal,
which I've shared on the other thread [1].

[1] https://lore.kernel.org/git/CANQDOdcdhfGtPg0PxpXQA5gQ4x9VknKDKCCi4HEB0Z1xgnjKzg@mail.gmail.com/

>
> >> don't get why we have code like this:
> >>
> >>         @@ -503,10 +504,12 @@ static void unpack_all(void)
> >>                 if (!quiet)
> >>                         progress = start_progress(_("Unpacking objects"), nr_objects);
> >>                 CALLOC_ARRAY(obj_list, nr_objects);
> >>         +       plug_bulk_checkin();
> >>                 for (i = 0; i < nr_objects; i++) {
> >>                         unpack_one(i);
> >>                         display_progress(progress, i + 1);
> >>                 }
> >>         +       unplug_bulk_checkin();
> >>                 stop_progress(&progress);
> >>
> >>                 if (delta_list)
> >>
> >> As opposed to doing an fsync on the last object we're
> >> processing. I.e. why do we need the step of intentionally making the
> >> objects unavailable in the tmp-objdir, and creating a "cookie" file to
> >> sync at the start/end, as opposed to fsyncing on the last file (which
> >> we're writing out anyway).
> >>
> >> 1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
> >> 2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/
> >
> > It's important to not expose an object's final name until its contents
> > have been fsynced
> > to disk. We want to ensure that wherever we crash, we won't have a
> > loose object that
> > Git may later try to open where the filename doesn't match the content
> > hash. I believe it's
> > okay for a given OID to be missing, since a later command could
> > recreate it, but an object
> > with a wrong hash looks like it would persist until we do a git-fsck.
>
> Yes, we handle that rather badly, as I mentioned in some other threads,
> but not doing the fsync on the last object v.s. a "cookie" file right
> afterwards seems like a hail-mary at best, no?
>

I'm not quite grasping what you're saying here. Are you saying that
using a dummy
file instead of one of the actual objects is less likely to produce
the desired outcome
on actual filesystem implementations?

> > I thought about figuring out how to sync the last object rather than some random
> > "cookie" file, but it wasn't clear to me how I'd figure out which
> > object is actually last
> > from library code in a way that doesn't burden each command with
> > somehow figuring
> > out its last object and communicating that. The 'cookie' approach
> > seems to lead to a cleaner
> > interface for callers.
>
> The above quoted code is looping through nr_objects isn't it? Can't a
> "do fsync" be passed down to unpack_one() when we process the last loose
> object?

Are you proposing that we do something different for unpack_objects
versus update_index
and git-add?  I was hoping to keep all of the users of the batch fsync
functionality equivalent.
For the git-add workflow and update-index, we'd need to track the most
recent file so that we
can go back and fsync it.  I don't believe that syncing the last
object composes well with the existing
implementation of those commands.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases
  2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
@ 2021-11-30 21:27                   ` Elijah Newren
  2021-11-30 21:52                     ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Elijah Newren @ 2021-11-30 21:27 UTC (permalink / raw)
  To: Neeraj Singh via GitGitGadget
  Cc: Git Mailing List, Neeraj-Personal, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig,
	Ævar Arnfjörð Bjarmason, Randall S. Becker,
	Bagas Sanjaya, Neeraj K. Singh

On Mon, Nov 15, 2021 at 3:51 PM Neeraj Singh via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Neeraj Singh <neerajsi@microsoft.com>
>
> The tmp_objdir API provides the ability to create temporary object
> directories, but was designed with the goal of having subprocesses
> access these object stores, followed by the main process migrating
> objects from it to the main object store or just deleting it.  The
> subprocesses would view it as their primary datastore and write to it.
>
> Here we add the tmp_objdir_replace_primary_odb function that replaces
> the current process's writable "main" object directory with the
> specified one. The previous main object directory is restored in either
> tmp_objdir_migrate or tmp_objdir_destroy.
>
> For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
> object_database` to mark ephemeral object databases that do not require
> fsync durability.
>
> Add 'git prune' support for removing temporary object databases, and
> make sure that they have a name starting with tmp_ and containing an
> operation-specific name.
>
> Based-on-patch-by: Elijah Newren <newren@gmail.com>
>
> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/prune.c        | 23 +++++++++++++++---
>  builtin/receive-pack.c |  2 +-
>  environment.c          |  5 ++++
>  object-file.c          | 44 +++++++++++++++++++++++++++++++--
>  object-store.h         | 19 +++++++++++++++
>  object.c               |  2 +-
>  tmp-objdir.c           | 55 +++++++++++++++++++++++++++++++++++++++---
>  tmp-objdir.h           | 29 +++++++++++++++++++---
>  8 files changed, 165 insertions(+), 14 deletions(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 485c9a3c56f..a76e6a5f0e8 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -18,6 +18,7 @@ static int show_only;
>  static int verbose;
>  static timestamp_t expire;
>  static int show_progress = -1;
> +static struct strbuf remove_dir_buf = STRBUF_INIT;
>
>  static int prune_tmp_file(const char *fullpath)
>  {
> @@ -26,10 +27,20 @@ static int prune_tmp_file(const char *fullpath)
>                 return error("Could not stat '%s'", fullpath);
>         if (st.st_mtime > expire)
>                 return 0;
> -       if (show_only || verbose)
> -               printf("Removing stale temporary file %s\n", fullpath);
> -       if (!show_only)
> -               unlink_or_warn(fullpath);
> +       if (S_ISDIR(st.st_mode)) {
> +               if (show_only || verbose)
> +                       printf("Removing stale temporary directory %s\n", fullpath);
> +               if (!show_only) {
> +                       strbuf_reset(&remove_dir_buf);
> +                       strbuf_addstr(&remove_dir_buf, fullpath);
> +                       remove_dir_recursively(&remove_dir_buf, 0);

Why not just define remove_dir_buf here rather than as a global?  It'd
not only make the code more readable by keeping everything localized,
it would have prevented the forgotten strbuf_reset() bug from the
earlier round of this patch.

Sure, that'd be an extra memory allocation/free for each directory you
hit, which should be negligible compared to the cost of
remove_dir_recursively()...and I'm not sure this is performance
critical anyway (I don't see why we'd expect more than O(1) cruft
temporary directories).

> +               }
> +       } else {
> +               if (show_only || verbose)
> +                       printf("Removing stale temporary file %s\n", fullpath);
> +               if (!show_only)
> +                       unlink_or_warn(fullpath);
> +       }
>         return 0;
>  }
>
> @@ -97,6 +108,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
>
>  static int prune_subdir(unsigned int nr, const char *path, void *data)
>  {
> +       if (verbose)

Shouldn't this be
    if (show_only || verbose)
?

> +               printf("Removing directory %s\n", path);
> +
>         if (!show_only)
>                 rmdir(path);
>         return 0;
> @@ -184,5 +198,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>                 prune_shallow(show_only ? PRUNE_SHOW_ONLY : 0);
>         }
>
> +       strbuf_release(&remove_dir_buf);
>         return 0;
>  }
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 49b846d9605..8815e24cde5 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -2213,7 +2213,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>                 strvec_push(&child.args, alt_shallow_file);
>         }
>
> -       tmp_objdir = tmp_objdir_create();
> +       tmp_objdir = tmp_objdir_create("incoming");
>         if (!tmp_objdir) {
>                 if (err_fd > 0)
>                         close(err_fd);
> diff --git a/environment.c b/environment.c
> index 9da7f3c1a19..342400fcaad 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -17,6 +17,7 @@
>  #include "commit.h"
>  #include "strvec.h"
>  #include "object-store.h"
> +#include "tmp-objdir.h"
>  #include "chdir-notify.h"
>  #include "shallow.h"
>
> @@ -331,10 +332,14 @@ static void update_relative_gitdir(const char *name,
>                                    void *data)
>  {
>         char *path = reparent_relative_path(old_cwd, new_cwd, get_git_dir());
> +       struct tmp_objdir *tmp_objdir = tmp_objdir_unapply_primary_odb();
>         trace_printf_key(&trace_setup_key,
>                          "setup: move $GIT_DIR to '%s'",
>                          path);
> +
>         set_git_dir_1(path);
> +       if (tmp_objdir)
> +               tmp_objdir_reapply_primary_odb(tmp_objdir, old_cwd, new_cwd);
>         free(path);
>  }
>
> diff --git a/object-file.c b/object-file.c
> index c3d866a287e..0b6a61aeaff 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -683,6 +683,43 @@ void add_to_alternates_memory(const char *reference)
>                              '\n', NULL, 0);
>  }
>
> +struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy)
> +{
> +       struct object_directory *new_odb;
> +
> +       /*
> +        * Make sure alternates are initialized, or else our entry may be
> +        * overwritten when they are.
> +        */
> +       prepare_alt_odb(the_repository);
> +
> +       /*
> +        * Make a new primary odb and link the old primary ODB in as an
> +        * alternate
> +        */
> +       new_odb = xcalloc(1, sizeof(*new_odb));
> +       new_odb->path = xstrdup(dir);
> +       new_odb->will_destroy = will_destroy;
> +       new_odb->next = the_repository->objects->odb;
> +       the_repository->objects->odb = new_odb;
> +       return new_odb->next;
> +}
> +
> +void restore_primary_odb(struct object_directory *restore_odb, const char *old_path)
> +{
> +       struct object_directory *cur_odb = the_repository->objects->odb;
> +
> +       if (strcmp(old_path, cur_odb->path))
> +               BUG("expected %s as primary object store; found %s",
> +                   old_path, cur_odb->path);
> +
> +       if (cur_odb->next != restore_odb)
> +               BUG("we expect the old primary object store to be the first alternate");
> +
> +       the_repository->objects->odb = restore_odb;
> +       free_object_directory(cur_odb);
> +}
> +
>  /*
>   * Compute the exact path an alternate is at and returns it. In case of
>   * error NULL is returned and the human readable error is added to `err`
> @@ -1809,8 +1846,11 @@ int hash_object_file(const struct git_hash_algo *algo, const void *buf,
>  /* Finalize a file on disk, and close it. */
>  static void close_loose_object(int fd)
>  {
> -       if (fsync_object_files)
> -               fsync_or_die(fd, "loose object file");
> +       if (!the_repository->objects->odb->will_destroy) {
> +               if (fsync_object_files)
> +                       fsync_or_die(fd, "loose object file");
> +       }
> +
>         if (close(fd) != 0)
>                 die_errno(_("error when closing loose object file"));
>  }
> diff --git a/object-store.h b/object-store.h
> index 952efb6a4be..cb173e69392 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -27,6 +27,11 @@ struct object_directory {
>         uint32_t loose_objects_subdir_seen[8]; /* 256 bits */
>         struct oidtree *loose_objects_cache;
>
> +       /*
> +        * This object store is ephemeral, so there is no need to fsync.
> +        */
> +       int will_destroy;
> +
>         /*
>          * Path to the alternative object store. If this is a relative path,
>          * it is relative to the current working directory.
> @@ -58,6 +63,17 @@ void add_to_alternates_file(const char *dir);
>   */
>  void add_to_alternates_memory(const char *dir);
>
> +/*
> + * Replace the current writable object directory with the specified temporary
> + * object directory; returns the former primary object directory.
> + */
> +struct object_directory *set_temporary_primary_odb(const char *dir, int will_destroy);
> +
> +/*
> + * Restore a previous ODB replaced by set_temporary_main_odb.
> + */
> +void restore_primary_odb(struct object_directory *restore_odb, const char *old_path);
> +
>  /*
>   * Populate and return the loose object cache array corresponding to the
>   * given object ID.
> @@ -68,6 +84,9 @@ struct oidtree *odb_loose_cache(struct object_directory *odb,
>  /* Empty the loose object cache for the specified object directory. */
>  void odb_clear_loose_cache(struct object_directory *odb);
>
> +/* Clear and free the specified object directory */
> +void free_object_directory(struct object_directory *odb);
> +
>  struct packed_git {
>         struct hashmap_entry packmap_ent;
>         struct packed_git *next;
> diff --git a/object.c b/object.c
> index 23a24e678a8..048f96a260e 100644
> --- a/object.c
> +++ b/object.c
> @@ -513,7 +513,7 @@ struct raw_object_store *raw_object_store_new(void)
>         return o;
>  }
>
> -static void free_object_directory(struct object_directory *odb)
> +void free_object_directory(struct object_directory *odb)
>  {
>         free(odb->path);
>         odb_clear_loose_cache(odb);
> diff --git a/tmp-objdir.c b/tmp-objdir.c
> index b8d880e3626..3d38eeab66b 100644
> --- a/tmp-objdir.c
> +++ b/tmp-objdir.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "tmp-objdir.h"
> +#include "chdir-notify.h"
>  #include "dir.h"
>  #include "sigchain.h"
>  #include "string-list.h"
> @@ -11,6 +12,8 @@
>  struct tmp_objdir {
>         struct strbuf path;
>         struct strvec env;
> +       struct object_directory *prev_odb;
> +       int will_destroy;
>  };
>
>  /*
> @@ -38,6 +41,9 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
>         if (t == the_tmp_objdir)
>                 the_tmp_objdir = NULL;
>
> +       if (!on_signal && t->prev_odb)
> +               restore_primary_odb(t->prev_odb, t->path.buf);
> +
>         /*
>          * This may use malloc via strbuf_grow(), but we should
>          * have pre-grown t->path sufficiently so that this
> @@ -52,6 +58,7 @@ static int tmp_objdir_destroy_1(struct tmp_objdir *t, int on_signal)
>          */
>         if (!on_signal)
>                 tmp_objdir_free(t);
> +
>         return err;
>  }
>
> @@ -121,7 +128,7 @@ static int setup_tmp_objdir(const char *root)
>         return ret;
>  }
>
> -struct tmp_objdir *tmp_objdir_create(void)
> +struct tmp_objdir *tmp_objdir_create(const char *prefix)
>  {
>         static int installed_handlers;
>         struct tmp_objdir *t;
> @@ -129,11 +136,16 @@ struct tmp_objdir *tmp_objdir_create(void)
>         if (the_tmp_objdir)
>                 BUG("only one tmp_objdir can be used at a time");
>
> -       t = xmalloc(sizeof(*t));
> +       t = xcalloc(1, sizeof(*t));
>         strbuf_init(&t->path, 0);
>         strvec_init(&t->env);
>
> -       strbuf_addf(&t->path, "%s/incoming-XXXXXX", get_object_directory());
> +       /*
> +        * Use a string starting with tmp_ so that the builtin/prune.c code
> +        * can recognize any stale objdirs left behind by a crash and delete
> +        * them.
> +        */
> +       strbuf_addf(&t->path, "%s/tmp_objdir-%s-XXXXXX", get_object_directory(), prefix);
>
>         /*
>          * Grow the strbuf beyond any filename we expect to be placed in it.
> @@ -269,6 +281,13 @@ int tmp_objdir_migrate(struct tmp_objdir *t)
>         if (!t)
>                 return 0;
>
> +       if (t->prev_odb) {
> +               if (the_repository->objects->odb->will_destroy)
> +                       BUG("migrating an ODB that was marked for destruction");
> +               restore_primary_odb(t->prev_odb, t->path.buf);
> +               t->prev_odb = NULL;
> +       }
> +
>         strbuf_addbuf(&src, &t->path);
>         strbuf_addstr(&dst, get_object_directory());
>
> @@ -292,3 +311,33 @@ void tmp_objdir_add_as_alternate(const struct tmp_objdir *t)
>  {
>         add_to_alternates_memory(t->path.buf);
>  }
> +
> +void tmp_objdir_replace_primary_odb(struct tmp_objdir *t, int will_destroy)
> +{
> +       if (t->prev_odb)
> +               BUG("the primary object database is already replaced");
> +       t->prev_odb = set_temporary_primary_odb(t->path.buf, will_destroy);
> +       t->will_destroy = will_destroy;
> +}
> +
> +struct tmp_objdir *tmp_objdir_unapply_primary_odb(void)
> +{
> +       if (!the_tmp_objdir || !the_tmp_objdir->prev_odb)
> +               return NULL;
> +
> +       restore_primary_odb(the_tmp_objdir->prev_odb, the_tmp_objdir->path.buf);
> +       the_tmp_objdir->prev_odb = NULL;
> +       return the_tmp_objdir;
> +}
> +
> +void tmp_objdir_reapply_primary_odb(struct tmp_objdir *t, const char *old_cwd,
> +               const char *new_cwd)
> +{
> +       char *path;
> +
> +       path = reparent_relative_path(old_cwd, new_cwd, t->path.buf);
> +       strbuf_reset(&t->path);
> +       strbuf_addstr(&t->path, path);
> +       free(path);
> +       tmp_objdir_replace_primary_odb(t, t->will_destroy);
> +}
> diff --git a/tmp-objdir.h b/tmp-objdir.h
> index b1e45b4c75d..a3145051f25 100644
> --- a/tmp-objdir.h
> +++ b/tmp-objdir.h
> @@ -10,7 +10,7 @@
>   *
>   * Example:
>   *
> - *     struct tmp_objdir *t = tmp_objdir_create();
> + *     struct tmp_objdir *t = tmp_objdir_create("incoming");
>   *     if (!run_command_v_opt_cd_env(cmd, 0, NULL, tmp_objdir_env(t)) &&
>   *         !tmp_objdir_migrate(t))
>   *             printf("success!\n");
> @@ -22,9 +22,10 @@
>  struct tmp_objdir;
>
>  /*
> - * Create a new temporary object directory; returns NULL on failure.
> + * Create a new temporary object directory with the specified prefix;
> + * returns NULL on failure.
>   */
> -struct tmp_objdir *tmp_objdir_create(void);
> +struct tmp_objdir *tmp_objdir_create(const char *prefix);
>
>  /*
>   * Return a list of environment strings, suitable for use with
> @@ -51,4 +52,26 @@ int tmp_objdir_destroy(struct tmp_objdir *);
>   */
>  void tmp_objdir_add_as_alternate(const struct tmp_objdir *);
>
> +/*
> + * Replaces the main object store in the current process with the temporary
> + * object directory and makes the former main object store an alternate.
> + * If will_destroy is nonzero, the object directory may not be migrated.
> + */
> +void tmp_objdir_replace_primary_odb(struct tmp_objdir *, int will_destroy);
> +
> +/*
> + * If the primary object database was replaced by a temporary object directory,
> + * restore it to its original value while keeping the directory contents around.
> + * Returns NULL if the primary object database was not replaced.
> + */
> +struct tmp_objdir *tmp_objdir_unapply_primary_odb(void);
> +
> +/*
> + * Reapplies the former primary temporary object database, after protentially
> + * changing its relative path.
> + */
> +void tmp_objdir_reapply_primary_odb(struct tmp_objdir *, const char *old_cwd,
> +               const char *new_cwd);
> +
> +
>  #endif /* TMP_OBJDIR_H */
> --
> gitgitgadget

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases
  2021-11-30 21:27                   ` Elijah Newren
@ 2021-11-30 21:52                     ` Neeraj Singh
  2021-11-30 22:36                       ` Elijah Newren
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2021-11-30 21:52 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Neeraj Singh via GitGitGadget, Git Mailing List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Tue, Nov 30, 2021 at 1:27 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Mon, Nov 15, 2021 at 3:51 PM Neeraj Singh via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Neeraj Singh <neerajsi@microsoft.com>
> >
> > The tmp_objdir API provides the ability to create temporary object
> > directories, but was designed with the goal of having subprocesses
> > access these object stores, followed by the main process migrating
> > objects from it to the main object store or just deleting it.  The
> > subprocesses would view it as their primary datastore and write to it.
> >
> > Here we add the tmp_objdir_replace_primary_odb function that replaces
> > the current process's writable "main" object directory with the
> > specified one. The previous main object directory is restored in either
> > tmp_objdir_migrate or tmp_objdir_destroy.
> >
> > For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
> > object_database` to mark ephemeral object databases that do not require
> > fsync durability.
> >
> > Add 'git prune' support for removing temporary object databases, and
> > make sure that they have a name starting with tmp_ and containing an
> > operation-specific name.
> >
> > Based-on-patch-by: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> >  builtin/prune.c        | 23 +++++++++++++++---
> >  builtin/receive-pack.c |  2 +-
> >  environment.c          |  5 ++++
> >  object-file.c          | 44 +++++++++++++++++++++++++++++++--
> >  object-store.h         | 19 +++++++++++++++
> >  object.c               |  2 +-
> >  tmp-objdir.c           | 55 +++++++++++++++++++++++++++++++++++++++---
> >  tmp-objdir.h           | 29 +++++++++++++++++++---
> >  8 files changed, 165 insertions(+), 14 deletions(-)
> >
> > diff --git a/builtin/prune.c b/builtin/prune.c
> > index 485c9a3c56f..a76e6a5f0e8 100644
> > --- a/builtin/prune.c
> > +++ b/builtin/prune.c
> > @@ -18,6 +18,7 @@ static int show_only;
> >  static int verbose;
> >  static timestamp_t expire;
> >  static int show_progress = -1;
> > +static struct strbuf remove_dir_buf = STRBUF_INIT;
> >
> >  static int prune_tmp_file(const char *fullpath)
> >  {
> > @@ -26,10 +27,20 @@ static int prune_tmp_file(const char *fullpath)
> >                 return error("Could not stat '%s'", fullpath);
> >         if (st.st_mtime > expire)
> >                 return 0;
> > -       if (show_only || verbose)
> > -               printf("Removing stale temporary file %s\n", fullpath);
> > -       if (!show_only)
> > -               unlink_or_warn(fullpath);
> > +       if (S_ISDIR(st.st_mode)) {
> > +               if (show_only || verbose)
> > +                       printf("Removing stale temporary directory %s\n", fullpath);
> > +               if (!show_only) {
> > +                       strbuf_reset(&remove_dir_buf);
> > +                       strbuf_addstr(&remove_dir_buf, fullpath);
> > +                       remove_dir_recursively(&remove_dir_buf, 0);
>
> Why not just define remove_dir_buf here rather than as a global?  It'd
> not only make the code more readable by keeping everything localized,
> it would have prevented the forgotten strbuf_reset() bug from the
> earlier round of this patch.
>
> Sure, that'd be an extra memory allocation/free for each directory you
> hit, which should be negligible compared to the cost of
> remove_dir_recursively()...and I'm not sure this is performance
> critical anyway (I don't see why we'd expect more than O(1) cruft
> temporary directories).

I'll take this suggestion.

> > +               }
> > +       } else {
> > +               if (show_only || verbose)
> > +                       printf("Removing stale temporary file %s\n", fullpath);
> > +               if (!show_only)
> > +                       unlink_or_warn(fullpath);
> > +       }
> >         return 0;
> >  }
> >
> > @@ -97,6 +108,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
> >
> >  static int prune_subdir(unsigned int nr, const char *path, void *data)
> >  {
> > +       if (verbose)
>
> Shouldn't this be
>     if (show_only || verbose)
> ?

Doing that breaks one of the tests, since we print extra stuff that's
unexpected. I think I'm going to just revert this change, since it
appears that we call this callback and try to remove the directory
even if it's non-empty.

Do you have any comments or thoughts on how we want to allow the user
to configure fsync settings?

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases
  2021-11-30 21:52                     ` Neeraj Singh
@ 2021-11-30 22:36                       ` Elijah Newren
  0 siblings, 0 replies; 160+ messages in thread
From: Elijah Newren @ 2021-11-30 22:36 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj Singh via GitGitGadget, Git Mailing List,
	Johannes Schindelin, Jeff King, Jeff Hostetler,
	Christoph Hellwig, Ævar Arnfjörð Bjarmason,
	Randall S. Becker, Bagas Sanjaya, Neeraj K. Singh

On Tue, Nov 30, 2021 at 1:52 PM Neeraj Singh <nksingh85@gmail.com> wrote:
>
> On Tue, Nov 30, 2021 at 1:27 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > On Mon, Nov 15, 2021 at 3:51 PM Neeraj Singh via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: Neeraj Singh <neerajsi@microsoft.com>
> > >
> > > The tmp_objdir API provides the ability to create temporary object
> > > directories, but was designed with the goal of having subprocesses
> > > access these object stores, followed by the main process migrating
> > > objects from it to the main object store or just deleting it.  The
> > > subprocesses would view it as their primary datastore and write to it.
> > >
> > > Here we add the tmp_objdir_replace_primary_odb function that replaces
> > > the current process's writable "main" object directory with the
> > > specified one. The previous main object directory is restored in either
> > > tmp_objdir_migrate or tmp_objdir_destroy.
> > >
> > > For the --remerge-diff usecase, add a new `will_destroy` flag in `struct
> > > object_database` to mark ephemeral object databases that do not require
> > > fsync durability.
> > >
> > > Add 'git prune' support for removing temporary object databases, and
> > > make sure that they have a name starting with tmp_ and containing an
> > > operation-specific name.
> > >
> > > Based-on-patch-by: Elijah Newren <newren@gmail.com>
> > >
> > > Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
> > > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > > ---
> > >  builtin/prune.c        | 23 +++++++++++++++---
> > >  builtin/receive-pack.c |  2 +-
> > >  environment.c          |  5 ++++
> > >  object-file.c          | 44 +++++++++++++++++++++++++++++++--
> > >  object-store.h         | 19 +++++++++++++++
> > >  object.c               |  2 +-
> > >  tmp-objdir.c           | 55 +++++++++++++++++++++++++++++++++++++++---
> > >  tmp-objdir.h           | 29 +++++++++++++++++++---
> > >  8 files changed, 165 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/builtin/prune.c b/builtin/prune.c
> > > index 485c9a3c56f..a76e6a5f0e8 100644
> > > --- a/builtin/prune.c
> > > +++ b/builtin/prune.c
> > > @@ -18,6 +18,7 @@ static int show_only;
> > >  static int verbose;
> > >  static timestamp_t expire;
> > >  static int show_progress = -1;
> > > +static struct strbuf remove_dir_buf = STRBUF_INIT;
> > >
> > >  static int prune_tmp_file(const char *fullpath)
> > >  {
> > > @@ -26,10 +27,20 @@ static int prune_tmp_file(const char *fullpath)
> > >                 return error("Could not stat '%s'", fullpath);
> > >         if (st.st_mtime > expire)
> > >                 return 0;
> > > -       if (show_only || verbose)
> > > -               printf("Removing stale temporary file %s\n", fullpath);
> > > -       if (!show_only)
> > > -               unlink_or_warn(fullpath);
> > > +       if (S_ISDIR(st.st_mode)) {
> > > +               if (show_only || verbose)
> > > +                       printf("Removing stale temporary directory %s\n", fullpath);
> > > +               if (!show_only) {
> > > +                       strbuf_reset(&remove_dir_buf);
> > > +                       strbuf_addstr(&remove_dir_buf, fullpath);
> > > +                       remove_dir_recursively(&remove_dir_buf, 0);
> >
> > Why not just define remove_dir_buf here rather than as a global?  It'd
> > not only make the code more readable by keeping everything localized,
> > it would have prevented the forgotten strbuf_reset() bug from the
> > earlier round of this patch.
> >
> > Sure, that'd be an extra memory allocation/free for each directory you
> > hit, which should be negligible compared to the cost of
> > remove_dir_recursively()...and I'm not sure this is performance
> > critical anyway (I don't see why we'd expect more than O(1) cruft
> > temporary directories).
>
> I'll take this suggestion.
>
> > > +               }
> > > +       } else {
> > > +               if (show_only || verbose)
> > > +                       printf("Removing stale temporary file %s\n", fullpath);
> > > +               if (!show_only)
> > > +                       unlink_or_warn(fullpath);
> > > +       }
> > >         return 0;
> > >  }
> > >
> > > @@ -97,6 +108,9 @@ static int prune_cruft(const char *basename, const char *path, void *data)
> > >
> > >  static int prune_subdir(unsigned int nr, const char *path, void *data)
> > >  {
> > > +       if (verbose)
> >
> > Shouldn't this be
> >     if (show_only || verbose)
> > ?
>
> Doing that breaks one of the tests, since we print extra stuff that's
> unexpected. I think I'm going to just revert this change, since it
> appears that we call this callback and try to remove the directory
> even if it's non-empty.

Makes sense.

> Do you have any comments or thoughts on how we want to allow the user
> to configure fsync settings?

I don't; sorry.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-11-18  5:03                       ` Neeraj Singh
@ 2021-12-01 14:15                         ` Ævar Arnfjörð Bjarmason
  2022-03-09 23:02                           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-01 14:15 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong


On Wed, Nov 17 2021, Neeraj Singh wrote:

[Very late reply, sorry]

> On Tue, Nov 16, 2021 at 11:28 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>>
>> On Tue, Nov 16 2021, Neeraj Singh wrote:
>>
>> > On Tue, Nov 16, 2021 at 12:10 AM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >>
>> >>
>> >> On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:
>> >>
>> >> >  * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
>> >> >    'true', 'false', and 'batch'. This makes using old and new versions of
>> >> >    git with 'batch' mode a little trickier, but hopefully people will
>> >> >    generally be moving forward in versions.
>> >> >
>> >> > [1] See
>> >> > https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
>> >> > [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/
>> >>
>> >> I really think leaving that in-place is just being unnecessarily
>> >> cavalier. There's a lot of mixed-version environments where git is
>> >> deployed in, and we almost never break the configuration in this way (I
>> >> think in the past always by mistake).
>> >
>> >> In this case it's easy to avoid it, and coming up with a less narrow
>> >> config model[1] seems like a good idea in any case to unify the various
>> >> outstanding work in this area.
>> >>
>> >> More generally on this series, per the thread ending in [2] I really
>> >
>> > My primary goal in all of these changes is to move git-for-windows over to
>> > a default of batch fsync so that it can get closer to other platforms
>> > in performance
>> > of 'git add' while still retaining the same level of data integrity.
>> > I'm hoping that
>> > most end-users are just sticking to defaults here.
>> >
>> > I'm happy to change the configuration schema again if there's a
>> > consensus from the Git
>> > community that backwards-compatibility of the configuration is
>> > actually important to someone.
>> >
>> > Also, if we're doing a deeper rethink of the fsync configuration (as
>> > prompted by this work and
>> > Eric Wong's and Patrick Steinhardts work), do we want to retain a mode
>> > where we fsync some
>> > parts of the persistent repo data but not others?  If we add fsyncing
>> > of the index in addition to the refs,
>> > I believe we would have covered all of the critical data structures
>> > that would be needed to find the
>> > data that a user has added to the repo if they complete a series of
>> > git commands and then experience
>> > a system crash.
>>
>> Just talking about it is how we'll find consensus, maybe you & Junio
>> would like to keep it as-is. I don't see why we'd expose this bad edge
>> case in configuration handling to users when it's entirely avoidable,
>> and we're still in the design phase.
>
> After trying to figure out an implementation, I have a new proposal,
> which I've shared on the other thread [1].
>
> [1] https://lore.kernel.org/git/CANQDOdcdhfGtPg0PxpXQA5gQ4x9VknKDKCCi4HEB0Z1xgnjKzg@mail.gmail.com/

This LGTM, or something simpler as Junio points out with his "too
fine-grained?" comment as a follow-up. I'm honestly quite apathetic
about what we end up with exactly as long as:

 1. We get the people who are adding these config settings to talk & see if they make
    sense in combination.

 2. We avoid the trap of hard dying on older versions.

>>
>> >> don't get why we have code like this:
>> >>
>> >>         @@ -503,10 +504,12 @@ static void unpack_all(void)
>> >>                 if (!quiet)
>> >>                         progress = start_progress(_("Unpacking objects"), nr_objects);
>> >>                 CALLOC_ARRAY(obj_list, nr_objects);
>> >>         +       plug_bulk_checkin();
>> >>                 for (i = 0; i < nr_objects; i++) {
>> >>                         unpack_one(i);
>> >>                         display_progress(progress, i + 1);
>> >>                 }
>> >>         +       unplug_bulk_checkin();
>> >>                 stop_progress(&progress);
>> >>
>> >>                 if (delta_list)
>> >>
>> >> As opposed to doing an fsync on the last object we're
>> >> processing. I.e. why do we need the step of intentionally making the
>> >> objects unavailable in the tmp-objdir, and creating a "cookie" file to
>> >> sync at the start/end, as opposed to fsyncing on the last file (which
>> >> we're writing out anyway).
>> >>
>> >> 1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
>> >> 2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/
>> >
>> > It's important to not expose an object's final name until its contents
>> > have been fsynced
>> > to disk. We want to ensure that wherever we crash, we won't have a
>> > loose object that
>> > Git may later try to open where the filename doesn't match the content
>> > hash. I believe it's
>> > okay for a given OID to be missing, since a later command could
>> > recreate it, but an object
>> > with a wrong hash looks like it would persist until we do a git-fsck.
>>
>> Yes, we handle that rather badly, as I mentioned in some other threads,
>> but not doing the fsync on the last object v.s. a "cookie" file right
>> afterwards seems like a hail-mary at best, no?
>>
>
> I'm not quite grasping what you're saying here. Are you saying that
> using a dummy
> file instead of one of the actual objects is less likely to produce
> the desired outcome
> on actual filesystem implementations?

[...covered below...]

>> > I thought about figuring out how to sync the last object rather than some random
>> > "cookie" file, but it wasn't clear to me how I'd figure out which
>> > object is actually last
>> > from library code in a way that doesn't burden each command with
>> > somehow figuring
>> > out its last object and communicating that. The 'cookie' approach
>> > seems to lead to a cleaner
>> > interface for callers.
>>
>> The above quoted code is looping through nr_objects isn't it? Can't a
>> "do fsync" be passed down to unpack_one() when we process the last loose
>> object?
>
> Are you proposing that we do something different for unpack_objects
> versus update_index
> and git-add?  I was hoping to keep all of the users of the batch fsync
> functionality equivalent.
> For the git-add workflow and update-index, we'd need to track the most
> recent file so that we
> can go back and fsync it.  I don't believe that syncing the last
> object composes well with the existing
> implementation of those commands.

There's probably cases where we need the cookie. I just mean instead of
the API being (as seen above in the quoted part), pseudocode:

    # A
    bulk_checkin_start_make_cookie():
    n = 10
    for i in 1..n:
        write_nth(i, fsync: 0);
    bulk_checkin_end_commit_cookie();

To have it be:

    # B
    bulk_checkin_start(do_cookie: 0);
    n = 10
    for i in 1..n:
        write_nth(i, fsync: (i == n));
    bulk_checkin_end();

Or actually, presumably simpler as:

    # C
    all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
    end_fsync = bulk_checkin_mode() ? 1 : all_fsync;
    n = 10;
    for i in 1..n:
        write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);

I.e. maybe there are cases where you really do need "A", but we're
usually (always?) writing out N objects, and we usually know it at the
same point where you'd want the plug_bulk_checkin/unplug_bulk_checkin,
so just fsyncing the last object/file/ref/whatever means we don't need
the whole ceremony of the cookie file.

I don't mind it per-se, but "B" and "C" just seem a lot simpler,
particulary since as those examples show we'll presumably want to pass
down a "do fsync?" to these in general, and we even usually have a
display_progress() in there.

So doesn't just doing "B" or "C" eliminate the need for a cookie
entirely?

Another advantage of that is that you'll presumably want such tracking
anyway even for the case of "A".

Because as soon as you have say a batch operation of writing X objects
and Y refs you'd want to track this anyway. I.e. either only fsync() on
the ref write (particularly if there's just the one ref), or on the last
ref, or for each ref and no object syncs. So this (like "C", except for
the "do_batch" in the "end_fsync" case):

    # D
    do_batch = in_existing_bulk_checkin() ? 1 : 0;
    all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
    end_fsync = bulk_checkin_mode() ? do_batch : all_fsync;
    n = 10;
    for i in 1..n:
        write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);

I mean, usually we'd want the "all refs", I'm just thinking of a case
like "git fast-import" or other known-to-the-user batch operation.

Or, as in the case of my 4bc1fd6e394 (pack-objects: rename .idx files
into place after .bitmap files, 2021-09-09) we'd want to know that we're
writing all of say *.bitmap, *.rev where we currently fsync() all of
them, write *.bitmap, *.rev and *.pack (not sure that one is safe)
without fsync(), and then only fsync (or that and in-place move) the
*.idx.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2021-12-01 14:15                         ` Ævar Arnfjörð Bjarmason
@ 2022-03-09 23:02                           ` Ævar Arnfjörð Bjarmason
  2022-03-10  1:16                             ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-09 23:02 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong


On Wed, Dec 01 2021, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Nov 17 2021, Neeraj Singh wrote:
>
> [Very late reply, sorry]
>
>> On Tue, Nov 16, 2021 at 11:28 PM Ævar Arnfjörð Bjarmason
>> <avarab@gmail.com> wrote:
>>>
>>>
>>> On Tue, Nov 16 2021, Neeraj Singh wrote:
>>>
>>> > On Tue, Nov 16, 2021 at 12:10 AM Ævar Arnfjörð Bjarmason
>>> > <avarab@gmail.com> wrote:
>>> >>
>>> >>
>>> >> On Mon, Nov 15 2021, Neeraj K. Singh via GitGitGadget wrote:
>>> >>
>>> >> >  * Per [2], I'm leaving the fsyncObjectFiles configuration as is with
>>> >> >    'true', 'false', and 'batch'. This makes using old and new versions of
>>> >> >    git with 'batch' mode a little trickier, but hopefully people will
>>> >> >    generally be moving forward in versions.
>>> >> >
>>> >> > [1] See
>>> >> > https://lore.kernel.org/git/pull.1067.git.1635287730.gitgitgadget@gmail.com/
>>> >> > [2] https://lore.kernel.org/git/xmqqh7cimuxt.fsf@gitster.g/
>>> >>
>>> >> I really think leaving that in-place is just being unnecessarily
>>> >> cavalier. There's a lot of mixed-version environments where git is
>>> >> deployed in, and we almost never break the configuration in this way (I
>>> >> think in the past always by mistake).
>>> >
>>> >> In this case it's easy to avoid it, and coming up with a less narrow
>>> >> config model[1] seems like a good idea in any case to unify the various
>>> >> outstanding work in this area.
>>> >>
>>> >> More generally on this series, per the thread ending in [2] I really
>>> >
>>> > My primary goal in all of these changes is to move git-for-windows over to
>>> > a default of batch fsync so that it can get closer to other platforms
>>> > in performance
>>> > of 'git add' while still retaining the same level of data integrity.
>>> > I'm hoping that
>>> > most end-users are just sticking to defaults here.
>>> >
>>> > I'm happy to change the configuration schema again if there's a
>>> > consensus from the Git
>>> > community that backwards-compatibility of the configuration is
>>> > actually important to someone.
>>> >
>>> > Also, if we're doing a deeper rethink of the fsync configuration (as
>>> > prompted by this work and
>>> > Eric Wong's and Patrick Steinhardts work), do we want to retain a mode
>>> > where we fsync some
>>> > parts of the persistent repo data but not others?  If we add fsyncing
>>> > of the index in addition to the refs,
>>> > I believe we would have covered all of the critical data structures
>>> > that would be needed to find the
>>> > data that a user has added to the repo if they complete a series of
>>> > git commands and then experience
>>> > a system crash.
>>>
>>> Just talking about it is how we'll find consensus, maybe you & Junio
>>> would like to keep it as-is. I don't see why we'd expose this bad edge
>>> case in configuration handling to users when it's entirely avoidable,
>>> and we're still in the design phase.
>>
>> After trying to figure out an implementation, I have a new proposal,
>> which I've shared on the other thread [1].
>>
>> [1] https://lore.kernel.org/git/CANQDOdcdhfGtPg0PxpXQA5gQ4x9VknKDKCCi4HEB0Z1xgnjKzg@mail.gmail.com/
>
> This LGTM, or something simpler as Junio points out with his "too
> fine-grained?" comment as a follow-up. I'm honestly quite apathetic
> about what we end up with exactly as long as:
>
>  1. We get the people who are adding these config settings to talk & see if they make
>     sense in combination.
>
>  2. We avoid the trap of hard dying on older versions.
>
>>>
>>> >> don't get why we have code like this:
>>> >>
>>> >>         @@ -503,10 +504,12 @@ static void unpack_all(void)
>>> >>                 if (!quiet)
>>> >>                         progress = start_progress(_("Unpacking objects"), nr_objects);
>>> >>                 CALLOC_ARRAY(obj_list, nr_objects);
>>> >>         +       plug_bulk_checkin();
>>> >>                 for (i = 0; i < nr_objects; i++) {
>>> >>                         unpack_one(i);
>>> >>                         display_progress(progress, i + 1);
>>> >>                 }
>>> >>         +       unplug_bulk_checkin();
>>> >>                 stop_progress(&progress);
>>> >>
>>> >>                 if (delta_list)
>>> >>
>>> >> As opposed to doing an fsync on the last object we're
>>> >> processing. I.e. why do we need the step of intentionally making the
>>> >> objects unavailable in the tmp-objdir, and creating a "cookie" file to
>>> >> sync at the start/end, as opposed to fsyncing on the last file (which
>>> >> we're writing out anyway).
>>> >>
>>> >> 1. https://lore.kernel.org/git/211110.86r1bogg27.gmgdl@evledraar.gmail.com/
>>> >> 2. https://lore.kernel.org/git/20211111000349.GA703@neerajsi-x1.localdomain/
>>> >
>>> > It's important to not expose an object's final name until its contents
>>> > have been fsynced
>>> > to disk. We want to ensure that wherever we crash, we won't have a
>>> > loose object that
>>> > Git may later try to open where the filename doesn't match the content
>>> > hash. I believe it's
>>> > okay for a given OID to be missing, since a later command could
>>> > recreate it, but an object
>>> > with a wrong hash looks like it would persist until we do a git-fsck.
>>>
>>> Yes, we handle that rather badly, as I mentioned in some other threads,
>>> but not doing the fsync on the last object v.s. a "cookie" file right
>>> afterwards seems like a hail-mary at best, no?
>>>
>>
>> I'm not quite grasping what you're saying here. Are you saying that
>> using a dummy
>> file instead of one of the actual objects is less likely to produce
>> the desired outcome
>> on actual filesystem implementations?
>
> [...covered below...]
>
>>> > I thought about figuring out how to sync the last object rather than some random
>>> > "cookie" file, but it wasn't clear to me how I'd figure out which
>>> > object is actually last
>>> > from library code in a way that doesn't burden each command with
>>> > somehow figuring
>>> > out its last object and communicating that. The 'cookie' approach
>>> > seems to lead to a cleaner
>>> > interface for callers.
>>>
>>> The above quoted code is looping through nr_objects isn't it? Can't a
>>> "do fsync" be passed down to unpack_one() when we process the last loose
>>> object?
>>
>> Are you proposing that we do something different for unpack_objects
>> versus update_index
>> and git-add?  I was hoping to keep all of the users of the batch fsync
>> functionality equivalent.
>> For the git-add workflow and update-index, we'd need to track the most
>> recent file so that we
>> can go back and fsync it.  I don't believe that syncing the last
>> object composes well with the existing
>> implementation of those commands.
>
> There's probably cases where we need the cookie. I just mean instead of
> the API being (as seen above in the quoted part), pseudocode:
>
>     # A
>     bulk_checkin_start_make_cookie():
>     n = 10
>     for i in 1..n:
>         write_nth(i, fsync: 0);
>     bulk_checkin_end_commit_cookie();
>
> To have it be:
>
>     # B
>     bulk_checkin_start(do_cookie: 0);
>     n = 10
>     for i in 1..n:
>         write_nth(i, fsync: (i == n));
>     bulk_checkin_end();
>
> Or actually, presumably simpler as:
>
>     # C
>     all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
>     end_fsync = bulk_checkin_mode() ? 1 : all_fsync;
>     n = 10;
>     for i in 1..n:
>         write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);
>
> I.e. maybe there are cases where you really do need "A", but we're
> usually (always?) writing out N objects, and we usually know it at the
> same point where you'd want the plug_bulk_checkin/unplug_bulk_checkin,
> so just fsyncing the last object/file/ref/whatever means we don't need
> the whole ceremony of the cookie file.
>
> I don't mind it per-se, but "B" and "C" just seem a lot simpler,
> particulary since as those examples show we'll presumably want to pass
> down a "do fsync?" to these in general, and we even usually have a
> display_progress() in there.
>
> So doesn't just doing "B" or "C" eliminate the need for a cookie
> entirely?
>
> Another advantage of that is that you'll presumably want such tracking
> anyway even for the case of "A".
>
> Because as soon as you have say a batch operation of writing X objects
> and Y refs you'd want to track this anyway. I.e. either only fsync() on
> the ref write (particularly if there's just the one ref), or on the last
> ref, or for each ref and no object syncs. So this (like "C", except for
> the "do_batch" in the "end_fsync" case):
>
>     # D
>     do_batch = in_existing_bulk_checkin() ? 1 : 0;
>     all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
>     end_fsync = bulk_checkin_mode() ? do_batch : all_fsync;
>     n = 10;
>     for i in 1..n:
>         write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);
>
> I mean, usually we'd want the "all refs", I'm just thinking of a case
> like "git fast-import" or other known-to-the-user batch operation.
>
> Or, as in the case of my 4bc1fd6e394 (pack-objects: rename .idx files
> into place after .bitmap files, 2021-09-09) we'd want to know that we're
> writing all of say *.bitmap, *.rev where we currently fsync() all of
> them, write *.bitmap, *.rev and *.pack (not sure that one is safe)
> without fsync(), and then only fsync (or that and in-place move) the
> *.idx.

Replying to an old-ish E-Mail of mine with some more thought that came
to mind after[1] (another recently resurrected fsync() thread).

I wonder if there's another twist on the plan outlined in [2] that would
be both portable & efficient, i.e. the "slow" POSIX way to write files
A..Z is to open/write/close/fsync each one, so we'll trigger a HW flush
N times.

And as we've discussed, doing it just on Z will implicitly flush A..Y on
common OS's in the wild, which we're taking advantage of here.

But aside from the rename() dance in[2], what do those OS's do if you
write A..Z, fsync() the "fd" for Z, and then fsync A..Y (or, presumably
equivalently, in reverse order: Y..A).

I'd think they'd be smart enough to know that they already implicitly
flushed that data since Z was flushend, and make those fsync()'s a
rather cheap noop.

But I don't know, hence the question.

If that's true then perhaps it's a path towards having our cake and
eating it too in some cases?

I.e. an FS that would flush A..Y if we flush Z would do so quickly and
reliably, whereas a FS that doesn't have such an optimization might be
just as slow for all of A..Y, but at least it'll be safe.

1. https://lore.kernel.org/git/220309.867d93lztw.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/e1747ce00af7ab3170a69955b07d995d5321d6f3.1637020263.git.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2022-03-09 23:02                           ` Ævar Arnfjörð Bjarmason
@ 2022-03-10  1:16                             ` Neeraj Singh
  2022-03-10 14:01                               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2022-03-10  1:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong

On Wed, Mar 9, 2022 at 3:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> Replying to an old-ish E-Mail of mine with some more thought that came
> to mind after[1] (another recently resurrected fsync() thread).
>
> I wonder if there's another twist on the plan outlined in [2] that would
> be both portable & efficient, i.e. the "slow" POSIX way to write files
> A..Z is to open/write/close/fsync each one, so we'll trigger a HW flush
> N times.
>
> And as we've discussed, doing it just on Z will implicitly flush A..Y on
> common OS's in the wild, which we're taking advantage of here.
>
> But aside from the rename() dance in[2], what do those OS's do if you
> write A..Z, fsync() the "fd" for Z, and then fsync A..Y (or, presumably
> equivalently, in reverse order: Y..A).
>
> I'd think they'd be smart enough to know that they already implicitly
> flushed that data since Z was flushend, and make those fsync()'s a
> rather cheap noop.
>
> But I don't know, hence the question.
>
> If that's true then perhaps it's a path towards having our cake and
> eating it too in some cases?
>
> I.e. an FS that would flush A..Y if we flush Z would do so quickly and
> reliably, whereas a FS that doesn't have such an optimization might be
> just as slow for all of A..Y, but at least it'll be safe.
>
> 1. https://lore.kernel.org/git/220309.867d93lztw.gmgdl@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/e1747ce00af7ab3170a69955b07d995d5321d6f3.1637020263.git.gitgitgadget@gmail.com/

The important angle here is that we need some way to indicate to the
OS what A..Y is before we fsync on Z.  I.e. the OS will cache any
writes in memory until some sync-ish operation is done on *that
specific file*.  Syncing just 'Z' with no sync operations on A..Y
doesn't indicate that A..Y would get written out.  Apparently the bad
old ext3 behavior was similar to what you're proposing where a sync on
'Z' would imply something about independent files.

Here's an interesting paper I recently came across that proposes the
interface we'd really want, 'syncv':
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.924.1168&rep=rep1&type=pdf.

Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2022-03-10  1:16                             ` Neeraj Singh
@ 2022-03-10 14:01                               ` Ævar Arnfjörð Bjarmason
  2022-03-10 17:52                                 ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-03-10 14:01 UTC (permalink / raw)
  To: Neeraj Singh
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong


On Wed, Mar 09 2022, Neeraj Singh wrote:

> On Wed, Mar 9, 2022 at 3:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>> Replying to an old-ish E-Mail of mine with some more thought that came
>> to mind after[1] (another recently resurrected fsync() thread).
>>
>> I wonder if there's another twist on the plan outlined in [2] that would
>> be both portable & efficient, i.e. the "slow" POSIX way to write files
>> A..Z is to open/write/close/fsync each one, so we'll trigger a HW flush
>> N times.
>>
>> And as we've discussed, doing it just on Z will implicitly flush A..Y on
>> common OS's in the wild, which we're taking advantage of here.
>>
>> But aside from the rename() dance in[2], what do those OS's do if you
>> write A..Z, fsync() the "fd" for Z, and then fsync A..Y (or, presumably
>> equivalently, in reverse order: Y..A).
>>
>> I'd think they'd be smart enough to know that they already implicitly
>> flushed that data since Z was flushend, and make those fsync()'s a
>> rather cheap noop.
>>
>> But I don't know, hence the question.
>>
>> If that's true then perhaps it's a path towards having our cake and
>> eating it too in some cases?
>>
>> I.e. an FS that would flush A..Y if we flush Z would do so quickly and
>> reliably, whereas a FS that doesn't have such an optimization might be
>> just as slow for all of A..Y, but at least it'll be safe.
>>
>> 1. https://lore.kernel.org/git/220309.867d93lztw.gmgdl@evledraar.gmail.com/
>> 2. https://lore.kernel.org/git/e1747ce00af7ab3170a69955b07d995d5321d6f3.1637020263.git.gitgitgadget@gmail.com/
>
> The important angle here is that we need some way to indicate to the
> OS what A..Y is before we fsync on Z.  I.e. the OS will cache any
> writes in memory until some sync-ish operation is done on *that
> specific file*.  Syncing just 'Z' with no sync operations on A..Y
> doesn't indicate that A..Y would get written out.  Apparently the bad
> old ext3 behavior was similar to what you're proposing where a sync on
> 'Z' would imply something about independent files.

It's certainly starting to sound like I'm misunderstanding this whole
thing, but just to clarify again I'm talking about the sort of loops
mentioned upthread in my [1]. I.e. you have (to copy from that E-Mail):

    bulk_checkin_start_make_cookie():
    n = 10
    for i in 1..n:
        write_nth(i, fsync: 0);
    bulk_checkin_end_commit_cookie();

I.e. we have a "cookie" file in a given dir (where, in this example,
we'd also write files A..Z). I.e. we write:

    cookie
    {A..Z}
    cookie

And then only fsync() on the "cookie" at the end, which "flushes" the
A..Z updates on some FS's (again, all per my possibly-incorrect
understanding).

Which is why I proposed that in many/all cases we could do this,
i.e. just the same without the "cookie" file (which AFAICT isn't needed
per-se, but was just added to make the API a bit simpler in not needing
to modify the relevant loops):

    all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
    end_fsync = bulk_checkin_mode() ? 1 : all_fsync;
    n = 10;
    for i in 1..n:
        write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);

I.e. we don't pay the cost of the fsync() as we're in the loop, but just
for the last file, which "flushes" the rest.

So far all of that's a paraphrasing of existing exchanges, but what I
was wondering now in[2] is if we add this to this last example above:

    for i in 1..n-1:
        fsync_nth(i)

Wouldn't those same OS's that are being clever about deferring the
syncing of A..Z as a "batch" be clever enough to turn that (re-)syncing
into a NOOP?

Of course in this case we'd need to keep the fd's open and be clever
about E[MN]FILE (i.e. "Too many open..."), or do an fsync() every Nth
for some reasonable Nth, e.g. somewhere in the 2^10..2^12 range.

But *if* this works it seems to me to be something we might be able to
enable when "core.fsyncObjectFiles" is configured on those systems.

I.e. the implicit assumption with that configuration was that if we sync
N loose objects and then update and fsync the ref that the FS would
queue up the ref update after the syncing of the loose objects.

This new "cookie" (or my suggested "fsync last of N") is basically
making the same assumption, just with the slight twist that some OSs/FSs
are known to behave like that on a per-subdir basis, no?

> Here's an interesting paper I recently came across that proposes the
> interface we'd really want, 'syncv':
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.924.1168&rep=rep1&type=pdf.

1. https://lore.kernel.org/git/211201.864k7sbdjt.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/220310.86lexilo3d.gmgdl@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2022-03-10 14:01                               ` Ævar Arnfjörð Bjarmason
@ 2022-03-10 17:52                                 ` Neeraj Singh
  2022-03-10 18:08                                   ` rsbecker
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2022-03-10 17:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Randall S. Becker,
	Bagas Sanjaya, Elijah Newren, Neeraj K. Singh,
	Patrick Steinhardt, Junio C Hamano, Eric Wong

On Thu, Mar 10, 2022 at 6:17 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Wed, Mar 09 2022, Neeraj Singh wrote:
>
> > On Wed, Mar 9, 2022 at 3:10 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> >>
> >> Replying to an old-ish E-Mail of mine with some more thought that came
> >> to mind after[1] (another recently resurrected fsync() thread).
> >>
> >> I wonder if there's another twist on the plan outlined in [2] that would
> >> be both portable & efficient, i.e. the "slow" POSIX way to write files
> >> A..Z is to open/write/close/fsync each one, so we'll trigger a HW flush
> >> N times.
> >>
> >> And as we've discussed, doing it just on Z will implicitly flush A..Y on
> >> common OS's in the wild, which we're taking advantage of here.
> >>
> >> But aside from the rename() dance in[2], what do those OS's do if you
> >> write A..Z, fsync() the "fd" for Z, and then fsync A..Y (or, presumably
> >> equivalently, in reverse order: Y..A).
> >>
> >> I'd think they'd be smart enough to know that they already implicitly
> >> flushed that data since Z was flushend, and make those fsync()'s a
> >> rather cheap noop.
> >>
> >> But I don't know, hence the question.
> >>
> >> If that's true then perhaps it's a path towards having our cake and
> >> eating it too in some cases?
> >>
> >> I.e. an FS that would flush A..Y if we flush Z would do so quickly and
> >> reliably, whereas a FS that doesn't have such an optimization might be
> >> just as slow for all of A..Y, but at least it'll be safe.
> >>
> >> 1. https://lore.kernel.org/git/220309.867d93lztw.gmgdl@evledraar.gmail.com/
> >> 2. https://lore.kernel.org/git/e1747ce00af7ab3170a69955b07d995d5321d6f3.1637020263.git.gitgitgadget@gmail.com/
> >
> > The important angle here is that we need some way to indicate to the
> > OS what A..Y is before we fsync on Z.  I.e. the OS will cache any
> > writes in memory until some sync-ish operation is done on *that
> > specific file*.  Syncing just 'Z' with no sync operations on A..Y
> > doesn't indicate that A..Y would get written out.  Apparently the bad
> > old ext3 behavior was similar to what you're proposing where a sync on
> > 'Z' would imply something about independent files.
>
> It's certainly starting to sound like I'm misunderstanding this whole
> thing, but just to clarify again I'm talking about the sort of loops
> mentioned upthread in my [1]. I.e. you have (to copy from that E-Mail):
>
>     bulk_checkin_start_make_cookie():
>     n = 10
>     for i in 1..n:
>         write_nth(i, fsync: 0);
>     bulk_checkin_end_commit_cookie();
>
> I.e. we have a "cookie" file in a given dir (where, in this example,
> we'd also write files A..Z). I.e. we write:
>
>     cookie
>     {A..Z}
>     cookie
>
> And then only fsync() on the "cookie" at the end, which "flushes" the
> A..Z updates on some FS's (again, all per my possibly-incorrect
> understanding).
>
> Which is why I proposed that in many/all cases we could do this,
> i.e. just the same without the "cookie" file (which AFAICT isn't needed
> per-se, but was just added to make the API a bit simpler in not needing
> to modify the relevant loops):
>
>     all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
>     end_fsync = bulk_checkin_mode() ? 1 : all_fsync;
>     n = 10;
>     for i in 1..n:
>         write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);
>
> I.e. we don't pay the cost of the fsync() as we're in the loop, but just
> for the last file, which "flushes" the rest.
>
> So far all of that's a paraphrasing of existing exchanges, but what I
> was wondering now in[2] is if we add this to this last example above:
>
>     for i in 1..n-1:
>         fsync_nth(i)
>
> Wouldn't those same OS's that are being clever about deferring the
> syncing of A..Z as a "batch" be clever enough to turn that (re-)syncing
> into a NOOP?
>
> Of course in this case we'd need to keep the fd's open and be clever
> about E[MN]FILE (i.e. "Too many open..."), or do an fsync() every Nth
> for some reasonable Nth, e.g. somewhere in the 2^10..2^12 range.
>
> But *if* this works it seems to me to be something we might be able to
> enable when "core.fsyncObjectFiles" is configured on those systems.
>
> I.e. the implicit assumption with that configuration was that if we sync
> N loose objects and then update and fsync the ref that the FS would
> queue up the ref update after the syncing of the loose objects.
>
> This new "cookie" (or my suggested "fsync last of N") is basically
> making the same assumption, just with the slight twist that some OSs/FSs
> are known to behave like that on a per-subdir basis, no?
>
> > Here's an interesting paper I recently came across that proposes the
> > interface we'd really want, 'syncv':
> > https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.924.1168&rep=rep1&type=pdf.
>
> 1. https://lore.kernel.org/git/211201.864k7sbdjt.gmgdl@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/220310.86lexilo3d.gmgdl@evledraar.gmail.com/

On the actual FS implementations in the three common OSes I'm familiar with
(macOS, Windows, Linux), each file has its own independent data caching in OS
memory.  Fsyncing one of them doesn't necessarily imply writing out
the OS cache for
any other file.  Except, apparently, on ext3 in data=ordered mode, but
that FS is no
longer common.  On Linux, we use sync_file_range to get the OS to
write the in-memory
cache to the storage hardware, which is what makes the data
'available' to fsync.

Now, we could consider an implementation where we call sync_file_range
without the
wait flags (i.e. without SYNC_FILE_RANGE_WAIT_BEFORE and
SYNC_FILE_RANGE_WAIT_AFTER). Then we could later fsync every file (or batch of
files), which might be more efficient if the OS coalesces the disk
cache flushes.  I expect
that this method is less likely to give us the desired performance on
common linux FSes,
however.

The macOS and Windows APIs are defined a bit differently from Linux.
In both those OSes,
we're actually calling fsync-equivalent APIs that are defined to write
back all the relevant data and
metadata, just without the storage cache flush.

So to summarize:
1. We need to do write(2) to get the data out of Git and into the OS
filesystem cache.
2. We need some API (macOS-fsync, Windows-NtFlushBuffersFileEx,
Linux-sync_file_range)
   to transfer the data per-file to the storage controller, but
without flushing the storage controller.
3. We need some api (macOS-F_FULLFSYNC, Windows-NtFlushBuffersFile, Linux-fsync)
   to push the storage controller cache to durable media. This only
needs to be done once
   at the end to push out the data made available in step (2).


Thanks,
Neeraj

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2022-03-10 17:52                                 ` Neeraj Singh
@ 2022-03-10 18:08                                   ` rsbecker
  2022-03-10 18:43                                     ` Neeraj Singh
  0 siblings, 1 reply; 160+ messages in thread
From: rsbecker @ 2022-03-10 18:08 UTC (permalink / raw)
  To: 'Neeraj Singh', 'Ævar Arnfjörð Bjarmason'
  Cc: 'Neeraj K. Singh via GitGitGadget', 'Git List',
	'Johannes Schindelin', 'Jeff King',
	'Jeff Hostetler', 'Christoph Hellwig',
	'Bagas Sanjaya', 'Elijah Newren',
	'Neeraj K. Singh', 'Patrick Steinhardt',
	'Junio C Hamano', 'Eric Wong'

On March 10, 2022 12:53 PM, Neeraj Singh wrote:
>On Thu, Mar 10, 2022 at 6:17 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>wrote:
>>
>>
>> On Wed, Mar 09 2022, Neeraj Singh wrote:
>>
>> > On Wed, Mar 9, 2022 at 3:10 PM Ævar Arnfjörð Bjarmason
><avarab@gmail.com> wrote:
>> >>
>> >> Replying to an old-ish E-Mail of mine with some more thought that
>> >> came to mind after[1] (another recently resurrected fsync() thread).
>> >>
>> >> I wonder if there's another twist on the plan outlined in [2] that
>> >> would be both portable & efficient, i.e. the "slow" POSIX way to
>> >> write files A..Z is to open/write/close/fsync each one, so we'll
>> >> trigger a HW flush N times.
>> >>
>> >> And as we've discussed, doing it just on Z will implicitly flush
>> >> A..Y on common OS's in the wild, which we're taking advantage of here.
>> >>
>> >> But aside from the rename() dance in[2], what do those OS's do if
>> >> you write A..Z, fsync() the "fd" for Z, and then fsync A..Y (or,
>> >> presumably equivalently, in reverse order: Y..A).
>> >>
>> >> I'd think they'd be smart enough to know that they already
>> >> implicitly flushed that data since Z was flushend, and make those
>> >> fsync()'s a rather cheap noop.
>> >>
>> >> But I don't know, hence the question.
>> >>
>> >> If that's true then perhaps it's a path towards having our cake and
>> >> eating it too in some cases?
>> >>
>> >> I.e. an FS that would flush A..Y if we flush Z would do so quickly
>> >> and reliably, whereas a FS that doesn't have such an optimization
>> >> might be just as slow for all of A..Y, but at least it'll be safe.
>> >>
>> >> 1.
>> >> https://lore.kernel.org/git/220309.867d93lztw.gmgdl@evledraar.gmail
>> >> .com/ 2.
>> >> https://lore.kernel.org/git/e1747ce00af7ab3170a69955b07d995d5321d6f
>> >> 3.1637020263.git.gitgitgadget@gmail.com/
>> >
>> > The important angle here is that we need some way to indicate to the
>> > OS what A..Y is before we fsync on Z.  I.e. the OS will cache any
>> > writes in memory until some sync-ish operation is done on *that
>> > specific file*.  Syncing just 'Z' with no sync operations on A..Y
>> > doesn't indicate that A..Y would get written out.  Apparently the
>> > bad old ext3 behavior was similar to what you're proposing where a
>> > sync on 'Z' would imply something about independent files.
>>
>> It's certainly starting to sound like I'm misunderstanding this whole
>> thing, but just to clarify again I'm talking about the sort of loops
>> mentioned upthread in my [1]. I.e. you have (to copy from that E-Mail):
>>
>>     bulk_checkin_start_make_cookie():
>>     n = 10
>>     for i in 1..n:
>>         write_nth(i, fsync: 0);
>>     bulk_checkin_end_commit_cookie();
>>
>> I.e. we have a "cookie" file in a given dir (where, in this example,
>> we'd also write files A..Z). I.e. we write:
>>
>>     cookie
>>     {A..Z}
>>     cookie
>>
>> And then only fsync() on the "cookie" at the end, which "flushes" the
>> A..Z updates on some FS's (again, all per my possibly-incorrect
>> understanding).
>>
>> Which is why I proposed that in many/all cases we could do this, i.e.
>> just the same without the "cookie" file (which AFAICT isn't needed
>> per-se, but was just added to make the API a bit simpler in not
>> needing to modify the relevant loops):
>>
>>     all_fsync = bulk_checkin_mode() ? 0 : fsync_turned_on_in_general();
>>     end_fsync = bulk_checkin_mode() ? 1 : all_fsync;
>>     n = 10;
>>     for i in 1..n:
>>         write_nth(i, fsync: (i == n) ? end_fsync : all_fsync);
>>
>> I.e. we don't pay the cost of the fsync() as we're in the loop, but
>> just for the last file, which "flushes" the rest.
>>
>> So far all of that's a paraphrasing of existing exchanges, but what I
>> was wondering now in[2] is if we add this to this last example above:
>>
>>     for i in 1..n-1:
>>         fsync_nth(i)
>>
>> Wouldn't those same OS's that are being clever about deferring the
>> syncing of A..Z as a "batch" be clever enough to turn that
>> (re-)syncing into a NOOP?
>>
>> Of course in this case we'd need to keep the fd's open and be clever
>> about E[MN]FILE (i.e. "Too many open..."), or do an fsync() every Nth
>> for some reasonable Nth, e.g. somewhere in the 2^10..2^12 range.
>>
>> But *if* this works it seems to me to be something we might be able to
>> enable when "core.fsyncObjectFiles" is configured on those systems.
>>
>> I.e. the implicit assumption with that configuration was that if we
>> sync N loose objects and then update and fsync the ref that the FS
>> would queue up the ref update after the syncing of the loose objects.
>>
>> This new "cookie" (or my suggested "fsync last of N") is basically
>> making the same assumption, just with the slight twist that some
>> OSs/FSs are known to behave like that on a per-subdir basis, no?
>>
>> > Here's an interesting paper I recently came across that proposes the
>> > interface we'd really want, 'syncv':
>> >
>https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.924.1168&rep=rep1
>&type=pdf.
>>
>> 1.
>> https://lore.kernel.org/git/211201.864k7sbdjt.gmgdl@evledraar.gmail.co
>> m/ 2.
>> https://lore.kernel.org/git/220310.86lexilo3d.gmgdl@evledraar.gmail.co
>> m/
>
>On the actual FS implementations in the three common OSes I'm familiar with
>(macOS, Windows, Linux), each file has its own independent data caching in OS
>memory.  Fsyncing one of them doesn't necessarily imply writing out the OS cache
>for any other file.  Except, apparently, on ext3 in data=ordered mode, but that FS
>is no longer common.  On Linux, we use sync_file_range to get the OS to write the
>in-memory cache to the storage hardware, which is what makes the data
>'available' to fsync.
>
>Now, we could consider an implementation where we call sync_file_range
>without the wait flags (i.e. without SYNC_FILE_RANGE_WAIT_BEFORE and
>SYNC_FILE_RANGE_WAIT_AFTER). Then we could later fsync every file (or batch
>of files), which might be more efficient if the OS coalesces the disk cache flushes.  I
>expect that this method is less likely to give us the desired performance on
>common linux FSes, however.
>
>The macOS and Windows APIs are defined a bit differently from Linux.
>In both those OSes,
>we're actually calling fsync-equivalent APIs that are defined to write back all the
>relevant data and metadata, just without the storage cache flush.
>
>So to summarize:
>1. We need to do write(2) to get the data out of Git and into the OS filesystem
>cache.
>2. We need some API (macOS-fsync, Windows-NtFlushBuffersFileEx,
>Linux-sync_file_range)
>   to transfer the data per-file to the storage controller, but without flushing the
>storage controller.
>3. We need some api (macOS-F_FULLFSYNC, Windows-NtFlushBuffersFile, Linux-
>fsync)
>   to push the storage controller cache to durable media. This only needs to be
>done once
>   at the end to push out the data made available in step (2).

While this might not be a surprise, on some platforms fsync is a thread-blocking operation. When the OS has kernel threads, fsync can potentially cause multiple processes (if implemented that way) to block, particularly where an fd is shared across threads (and thus processes), which may end up causing a deadlock. We might need to keep an eye out for this type of situation in the future and at least try to test for it. I cannot actually see a situation where this would occur in git, but that does not mean it is impossible. Food for thought.
--Randall


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2022-03-10 18:08                                   ` rsbecker
@ 2022-03-10 18:43                                     ` Neeraj Singh
  2022-03-10 18:48                                       ` rsbecker
  0 siblings, 1 reply; 160+ messages in thread
From: Neeraj Singh @ 2022-03-10 18:43 UTC (permalink / raw)
  To: Randall S. Becker
  Cc: Ævar Arnfjörð Bjarmason,
	Neeraj K. Singh via GitGitGadget, Git List, Johannes Schindelin,
	Jeff King, Jeff Hostetler, Christoph Hellwig, Bagas Sanjaya,
	Elijah Newren, Neeraj K. Singh, Patrick Steinhardt,
	Junio C Hamano, Eric Wong

On Thu, Mar 10, 2022 at 10:08 AM <rsbecker@nexbridge.com> wrote:
> While this might not be a surprise, on some platforms fsync is a thread-blocking operation. When the OS has kernel threads, fsync can potentially cause multiple processes (if implemented that way) to block, particularly where an fd is shared across threads (and thus processes), which may end up causing a deadlock. We might need to keep an eye out for this type of situation in the future and at least try to test for it. I cannot actually see a situation where this would occur in git, but that does not mean it is impossible. Food for thought.
> --Randall

fsync is expected to block the calling thread until the underlying
data is durable.  Unless the OS somehow depends on the git process to
make progress before fsync can complete, there should be no deadlock,
since there would be no cycle in the waiting graph.  This could be a
problem for FUSE implementations that are backed by Git, but they
already have to deal with that possiblity today and this patch series
doesn't change anything.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles
  2022-03-10 18:43                                     ` Neeraj Singh
@ 2022-03-10 18:48                                       ` rsbecker
  0 siblings, 0 replies; 160+ messages in thread
From: rsbecker @ 2022-03-10 18:48 UTC (permalink / raw)
  To: 'Neeraj Singh'
  Cc: 'Ævar Arnfjörð Bjarmason',
	'Neeraj K. Singh via GitGitGadget', 'Git List',
	'Johannes Schindelin', 'Jeff King',
	'Jeff Hostetler', 'Christoph Hellwig',
	'Bagas Sanjaya', 'Elijah Newren',
	'Neeraj K. Singh', 'Patrick Steinhardt',
	'Junio C Hamano', 'Eric Wong'

On March 10, 2022 1:43 PM, Neeraj Singh wrote:
>On Thu, Mar 10, 2022 at 10:08 AM <rsbecker@nexbridge.com> wrote:
>> While this might not be a surprise, on some platforms fsync is a thread-blocking
>operation. When the OS has kernel threads, fsync can potentially cause multiple
>processes (if implemented that way) to block, particularly where an fd is shared
>across threads (and thus processes), which may end up causing a deadlock. We
>might need to keep an eye out for this type of situation in the future and at least
>try to test for it. I cannot actually see a situation where this would occur in git, but
>that does not mean it is impossible. Food for thought.
>> --Randall
>
>fsync is expected to block the calling thread until the underlying data is durable.
>Unless the OS somehow depends on the git process to make progress before
>fsync can complete, there should be no deadlock, since there would be no cycle in
>the waiting graph.  This could be a problem for FUSE implementations that are
>backed by Git, but they already have to deal with that possiblity today and this
>patch series doesn't change anything.

That assumption is based on a specific threading model. In cooperative user-thread models, fsync is process-blocking. While fsync, by spec is required to block the thread, there are no limitations on blocking everything else. In some systems, an fsync can block the entire file system. Just pointing that out. 


^ permalink raw reply	[flat|nested] 160+ messages in thread

end of thread, other threads:[~2022-03-10 18:48 UTC | newest]

Thread overview: 160+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25  1:51 [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-08-25  1:51 ` [PATCH 1/2] object-file: use futimes rather than utime Neeraj Singh via GitGitGadget
2021-08-25 13:51   ` Johannes Schindelin
2021-08-25 22:08     ` Neeraj Singh
2021-08-25  1:51 ` [PATCH 2/2] core.fsyncobjectfiles: batch disk flushes Neeraj Singh via GitGitGadget
2021-08-25  5:38   ` Christoph Hellwig
2021-08-25 17:40     ` Neeraj Singh
2021-08-26  5:54       ` Christoph Hellwig
2021-08-25 16:11   ` Ævar Arnfjörð Bjarmason
2021-08-26  0:49     ` Neeraj Singh
2021-08-26  5:50       ` Christoph Hellwig
2021-08-28  0:20         ` Neeraj Singh
2021-08-28  6:57           ` Christoph Hellwig
2021-08-31 19:59             ` Neeraj Singh
2021-09-01  5:09               ` Christoph Hellwig
2021-08-26  5:57     ` Christoph Hellwig
2021-08-25 18:52   ` Johannes Schindelin
2021-08-25 21:26     ` Junio C Hamano
2021-08-26  1:19     ` Neeraj Singh
2021-08-25 16:58 ` [PATCH 0/2] [RFC] Implement a bulk-checkin option for core.fsyncObjectFiles Neeraj Singh
2021-08-27 23:49 ` [PATCH v2 0/6] Implement a batched fsync " Neeraj K. Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 1/6] object-file: use futimens rather than utime Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 2/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 3/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 4/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 5/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-08-27 23:49   ` [PATCH v2 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-07 19:44   ` [PATCH v2 0/6] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-07 19:50     ` Ævar Arnfjörð Bjarmason
2021-09-07 19:54     ` Randall S. Becker
2021-09-08  0:54       ` Neeraj Singh
2021-09-08  1:22         ` Ævar Arnfjörð Bjarmason
2021-09-08 14:04           ` Randall S. Becker
2021-09-08 19:01           ` Neeraj Singh
2021-09-08  0:55     ` Neeraj Singh
2021-09-08  6:44       ` Junio C Hamano
2021-09-08  6:49         ` Christoph Hellwig
2021-09-08 13:57           ` Randall S. Becker
2021-09-08 14:13             ` 'Christoph Hellwig'
2021-09-08 14:25               ` Randall S. Becker
2021-09-08 16:34         ` Neeraj Singh
2021-09-08 19:12           ` Junio C Hamano
2021-09-08 19:20             ` Neeraj Singh
2021-09-08 19:23           ` Ævar Arnfjörð Bjarmason
2021-09-14  3:38   ` [PATCH v3 " Neeraj K. Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-14 10:39       ` Bagas Sanjaya
2021-09-14 19:05         ` Neeraj Singh
2021-09-14 19:34       ` Junio C Hamano
2021-09-14 20:33         ` Junio C Hamano
2021-09-15  4:55         ` Neeraj Singh
2021-09-14  3:38     ` [PATCH v3 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-14 19:35       ` Junio C Hamano
2021-09-14  3:38     ` [PATCH v3 5/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-14  3:38     ` [PATCH v3 6/6] core.fsyncobjectfiles: enable batch mode for testing Neeraj Singh via GitGitGadget
2021-09-15 16:21       ` Junio C Hamano
2021-09-15 22:43         ` Neeraj Singh
2021-09-15 23:12           ` Junio C Hamano
2021-09-16  6:19             ` Junio C Hamano
2021-09-14  5:49     ` [PATCH v3 0/6] Implement a batched fsync option for core.fsyncObjectFiles Christoph Hellwig
2021-09-20 22:15     ` [PATCH v4 " Neeraj K. Singh via GitGitGadget
2021-09-20 22:15       ` [PATCH v4 1/6] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-20 22:15       ` [PATCH v4 2/6] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-21 23:16         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:23           ` Neeraj Singh
2021-09-22  2:02             ` Ævar Arnfjörð Bjarmason
2021-09-22 19:46             ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 3/6] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:42         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:23           ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 4/6] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-21 23:46         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:27           ` Neeraj Singh
2021-09-23 22:32             ` Neeraj Singh
2021-09-20 22:15       ` [PATCH v4 5/6] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-21 23:54         ` Ævar Arnfjörð Bjarmason
2021-09-22  1:30           ` Neeraj Singh
2021-09-22  1:58             ` Ævar Arnfjörð Bjarmason
2021-09-22 17:55               ` Neeraj Singh
2021-09-22 20:01                 ` Ævar Arnfjörð Bjarmason
2021-09-20 22:15       ` [PATCH v4 6/6] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 20:12       ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 1/7] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 2/7] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 3/7] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-24 21:47           ` Neeraj Singh
2021-09-24 20:12         ` [PATCH v5 4/7] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 21:49           ` Neeraj Singh
2021-09-24 20:12         ` [PATCH v5 5/7] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 6/7] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 20:12         ` [PATCH v5 7/7] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-24 23:31         ` [PATCH v5 0/7] Implement a batched fsync option for core.fsyncObjectFiles Neeraj Singh
2021-09-24 23:53         ` [PATCH v6 0/8] " Neeraj K. Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 1/8] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 2/8] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 3/8] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-25  3:15             ` Bagas Sanjaya
2021-09-27  0:27               ` Neeraj Singh
2021-09-24 23:53           ` [PATCH v6 4/8] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-27 20:07             ` Junio C Hamano
2021-09-27 20:55               ` Neeraj Singh
2021-09-27 21:03                 ` Neeraj Singh
2021-09-27 23:53                   ` Junio C Hamano
2021-09-24 23:53           ` [PATCH v6 5/8] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 6/8] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 7/8] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-24 23:53           ` [PATCH v6 8/8] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-09-28 23:32           ` [PATCH v7 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 1/9] object-file.c: do not rename in a temp odb Neeraj Singh via GitGitGadget
2021-09-28 23:55               ` Jeff King
2021-09-29  0:10                 ` Neeraj Singh
2021-09-28 23:32             ` [PATCH v7 2/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-09-29  8:41               ` Elijah Newren
2021-09-29 16:40                 ` Neeraj Singh
2021-09-28 23:32             ` [PATCH v7 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-09-28 23:32             ` [PATCH v7 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-10-04 16:57             ` [PATCH v8 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-10-04 16:57               ` [PATCH v8 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-15 23:50               ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Neeraj K. Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 1/9] tmp-objdir: new API for creating temporary writable databases Neeraj Singh via GitGitGadget
2021-11-30 21:27                   ` Elijah Newren
2021-11-30 21:52                     ` Neeraj Singh
2021-11-30 22:36                       ` Elijah Newren
2021-11-15 23:50                 ` [PATCH v9 2/9] tmp-objdir: disable ref updates when replacing the primary odb Neeraj Singh via GitGitGadget
2021-11-16  7:23                   ` Ævar Arnfjörð Bjarmason
2021-11-16 20:38                     ` Neeraj Singh
2021-11-15 23:50                 ` [PATCH v9 3/9] bulk-checkin: rename 'state' variable and separate 'plugged' boolean Neeraj Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 4/9] core.fsyncobjectfiles: batched disk flushes Neeraj Singh via GitGitGadget
2021-11-15 23:50                 ` [PATCH v9 5/9] core.fsyncobjectfiles: add windows support for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 6/9] update-index: use the bulk-checkin infrastructure Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 7/9] unpack-objects: " Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 8/9] core.fsyncobjectfiles: tests for batch mode Neeraj Singh via GitGitGadget
2021-11-15 23:51                 ` [PATCH v9 9/9] core.fsyncobjectfiles: performance tests for add and stash Neeraj Singh via GitGitGadget
2021-11-16  8:02                 ` [PATCH v9 0/9] Implement a batched fsync option for core.fsyncObjectFiles Ævar Arnfjörð Bjarmason
2021-11-17  7:06                   ` Neeraj Singh
2021-11-17  7:24                     ` Ævar Arnfjörð Bjarmason
2021-11-18  5:03                       ` Neeraj Singh
2021-12-01 14:15                         ` Ævar Arnfjörð Bjarmason
2022-03-09 23:02                           ` Ævar Arnfjörð Bjarmason
2022-03-10  1:16                             ` Neeraj Singh
2022-03-10 14:01                               ` Ævar Arnfjörð Bjarmason
2022-03-10 17:52                                 ` Neeraj Singh
2022-03-10 18:08                                   ` rsbecker
2022-03-10 18:43                                     ` Neeraj Singh
2022-03-10 18:48                                       ` rsbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.