Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff Hostetler via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 19/23] fsmonitor--daemon: use a cookie file to sync with file system
Date: Tue, 27 Apr 2021 10:23:54 -0400
Message-ID: <5669d313-4084-dceb-1204-acd005e58657@gmail.com> (raw)
In-Reply-To: <038b62dc6744e8d8004f3f8423a40066821c0f1b.1617291666.git.gitgitgadget@gmail.com>

On 4/1/2021 11:41 AM, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Teach fsmonitor--daemon client threads to create a cookie file
> inside the .git directory and then wait until FS events for the
> cookie are observed by the FS listener thread.
> 
> This helps address the racy nature of file system events by
> blocking the client response until the kernel has drained any
> event backlog.

This description matches my expectation of the cookie file,
which furthers my confusion about GIT_TEST_FSMONITOR_CLIENT_DELAY.

> +enum fsmonitor_cookie_item_result {
> +	FCIR_ERROR = -1, /* could not create cookie file ? */
> +	FCIR_INIT = 0,
> +	FCIR_SEEN,
> +	FCIR_ABORT,
> +};
> +
> +struct fsmonitor_cookie_item {
> +	struct hashmap_entry entry;
> +	const char *name;
> +	enum fsmonitor_cookie_item_result result;
> +};
> +
> +static int cookies_cmp(const void *data, const struct hashmap_entry *he1,
> +		     const struct hashmap_entry *he2, const void *keydata)

I'm interested to see why a hashset is necessary.

> +static enum fsmonitor_cookie_item_result fsmonitor_wait_for_cookie(
> +	struct fsmonitor_daemon_state *state)
> +{
> +	int fd;
> +	struct fsmonitor_cookie_item cookie;
> +	struct strbuf cookie_pathname = STRBUF_INIT;
> +	struct strbuf cookie_filename = STRBUF_INIT;
> +	const char *slash;
> +	int my_cookie_seq;
> +
> +	pthread_mutex_lock(&state->main_lock);

Hm. We are entering a locked region. I hope this is only for the
cookie write and not the entire waiting period.

> +	my_cookie_seq = state->cookie_seq++;
> +
> +	strbuf_addbuf(&cookie_pathname, &state->path_cookie_prefix);
> +	strbuf_addf(&cookie_pathname, "%i-%i", getpid(), my_cookie_seq);
> +
> +	slash = find_last_dir_sep(cookie_pathname.buf);
> +	if (slash)
> +		strbuf_addstr(&cookie_filename, slash + 1);
> +	else
> +		strbuf_addbuf(&cookie_filename, &cookie_pathname);

This business about the slash-or-not-slash is good defensive
programming. I imagine the only possible way for there to not
be a slash is if the Git process is running with the .git
directory as its working directory?

> +	cookie.name = strbuf_detach(&cookie_filename, NULL);
> +	cookie.result = FCIR_INIT;
> +	// TODO should we have case-insenstive hash (and in cookie_cmp()) ??

This TODO comment should be cleaned up. Doesn't match C-style, either.

As for the question, I believe that we can limit ourselves to names that
don't need case-insensitive hashes and trust that the filesystem will not
change the case. Using lowercase letters should help with this.

> +	hashmap_entry_init(&cookie.entry, strhash(cookie.name));
> +
> +	/*
> +	 * Warning: we are putting the address of a stack variable into a
> +	 * global hashmap.  This feels dodgy.  We must ensure that we remove
> +	 * it before this thread and stack frame returns.
> +	 */
> +	hashmap_add(&state->cookies, &cookie.entry);

I saw this warning and thought about avoiding it by using the heap, but
even with a heap pointer we need to be careful to remove the result
before returning and stopping the thread.

However, there is likely a higher potential of a bug leading to a
security issue through an error causing stack corruption and unsafe
code execution. Perhaps it is worth converting to using heap data here.

> +	trace_printf_key(&trace_fsmonitor, "cookie-wait: '%s' '%s'",
> +			 cookie.name, cookie_pathname.buf);
> +
> +	/*
> +	 * Create the cookie file on disk and then wait for a notification
> +	 * that the listener thread has seen it.
> +	 */
> +	fd = open(cookie_pathname.buf, O_WRONLY | O_CREAT | O_EXCL, 0600);
> +	if (fd >= 0) {
> +		close(fd);
> +		unlink_or_warn(cookie_pathname.buf);

Interesting that we are ignoring the warning here. Is it possible that
these cookie files will continue to grow if this unlink fails?

> +
> +		while (cookie.result == FCIR_INIT)
> +			pthread_cond_wait(&state->cookies_cond,
> +					  &state->main_lock);

Ok, we are waiting here for another thread to signal that the cookie
file has been found in the events. What happens if the event gets lost?
I'll look for a later signal that cookie.result can change based on a
timeout, too.

> +
> +		hashmap_remove(&state->cookies, &cookie.entry, NULL);
> +	} else {
> +		error_errno(_("could not create fsmonitor cookie '%s'"),
> +			    cookie.name);
> +
> +		cookie.result = FCIR_ERROR;
> +		hashmap_remove(&state->cookies, &cookie.entry, NULL);
> +	}

Both blocks here remove the cookie entry, so move it to the end of the
method with the other cleanups.

> +
> +	pthread_mutex_unlock(&state->main_lock);

Hm. We are locking the main state throughout this process. I suppose that
the listener thread could be watching multiple repos and updating them
while we wait here for one repo to update. This is a larger lock window
than I was hoping for, but I don't currently see how to reduce it safely.

> +
> +	free((char*)cookie.name);
> +	strbuf_release(&cookie_pathname);
> +	return cookie.result;

Remove the cookie from the hashset along with these lines.

> +}
> +
> +/*
> + * Mark these cookies as _SEEN and wake up the corresponding client threads.
> + */
> +static void fsmonitor_cookie_mark_seen(struct fsmonitor_daemon_state *state,
> +				       const struct string_list *cookie_names)
> +{
> +	/* assert state->main_lock */

I'm now confused what this is trying to document. The 'state' should be
locked by another thread while we are waiting for a cookie response, so
this method is updating the cookie as seen from a different thread that
doesn't have the lock.

...
> +/*
> + * Set _ABORT on all pending cookies and wake up all client threads.
> + */
> +static void fsmonitor_cookie_abort_all(struct fsmonitor_daemon_state *state)
...

> + * [2] Some of those lost events may have been for cookie files.  We
> + *     should assume the worst and abort them rather letting them starve.
> + *
>   * If there are no readers of the the current token data series, we
>   * can free it now.  Otherwise, let the last reader free it.  Either
>   * way, the old token data series is no longer associated with our
> @@ -454,6 +600,8 @@ void fsmonitor_force_resync(struct fsmonitor_daemon_state *state)
>  			 state->current_token_data->token_id.buf,
>  			 new_one->token_id.buf);
>  
> +	fsmonitor_cookie_abort_all(state);
> +

I see we abort here if we force a resync. I lost the detail of whether
this is triggered by a timeout, too.

> @@ -654,6 +803,39 @@ static int do_handle_client(struct fsmonitor_daemon_state *state,
>  		goto send_trivial_response;
>  	}
>  
> +	pthread_mutex_unlock(&state->main_lock);
> +
> +	/*
> +	 * Write a cookie file inside the directory being watched in an
> +	 * effort to flush out existing filesystem events that we actually
> +	 * care about.  Suspend this client thread until we see the filesystem
> +	 * events for this cookie file.
> +	 */
> +	cookie_result = fsmonitor_wait_for_cookie(state);

Odd that we unlock before calling this method, then just take the lock
again inside of it.

> +	if (cookie_result != FCIR_SEEN) {
> +		error(_("fsmonitor: cookie_result '%d' != SEEN"),
> +		      cookie_result);
> +		result = 0;
> +		goto send_trivial_response;
> +	}
> +
> +	pthread_mutex_lock(&state->main_lock);
> +
> +	if (strcmp(requested_token_id.buf,
> +		   state->current_token_data->token_id.buf)) {
> +		/*
> +		 * Ack! The listener thread lost sync with the filesystem
> +		 * and created a new token while we were waiting for the
> +		 * cookie file to be created!  Just give up.
> +		 */
> +		pthread_mutex_unlock(&state->main_lock);
> +
> +		trace_printf_key(&trace_fsmonitor,
> +				 "lost filesystem sync");
> +		result = 0;
> +		goto send_trivial_response;
> +	}
> +
>  	/*
>  	 * We're going to hold onto a pointer to the current
>  	 * token-data while we walk the list of batches of files.
> @@ -982,6 +1164,9 @@ void fsmonitor_publish(struct fsmonitor_daemon_state *state,
>  		}
>  	}
>  
> +	if (cookie_names->nr)
> +		fsmonitor_cookie_mark_seen(state, cookie_names);
> +

I was confused as to what updates 'cookie_names', but it appears that
these are updated in the platform-specific code. That seems to happen
in later patches.

Thanks,
-Stolee

  reply index

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 15:40 [PATCH 00/23] [RFC] Builtin FSMonitor Feature Jeff Hostetler via GitGitGadget
2021-04-01 15:40 ` [PATCH 01/23] fsmonitor--daemon: man page and documentation Jeff Hostetler via GitGitGadget
2021-04-26 14:13   ` Derrick Stolee
2021-04-28 13:54     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 02/23] fsmonitor-ipc: create client routines for git-fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-26 14:31   ` Derrick Stolee
2021-04-26 20:20     ` Eric Sunshine
2021-04-26 21:02       ` Derrick Stolee
2021-04-28 19:26     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 03/23] config: FSMonitor is repository-specific Johannes Schindelin via GitGitGadget
2021-04-01 15:40 ` [PATCH 04/23] fsmonitor: introduce `core.useBuiltinFSMonitor` to call the daemon via IPC Johannes Schindelin via GitGitGadget
2021-04-26 14:56   ` Derrick Stolee
2021-04-27  9:20     ` Ævar Arnfjörð Bjarmason
2021-04-27 12:42       ` Derrick Stolee
2021-04-28  7:59         ` Ævar Arnfjörð Bjarmason
2021-04-28 16:26           ` [PATCH] repo-settings.c: simplify the setup Ævar Arnfjörð Bjarmason
2021-04-28 19:09             ` Nesting topics within other threads (was: [PATCH] repo-settings.c: simplify the setup) Derrick Stolee
2021-04-28 23:01               ` Ævar Arnfjörð Bjarmason
2021-05-05 16:12                 ` Johannes Schindelin
2021-04-29  5:12               ` Nesting topics within other threads Junio C Hamano
2021-04-29 12:14                 ` Ævar Arnfjörð Bjarmason
2021-04-29 20:14                   ` Jeff King
2021-04-30  0:07                   ` Junio C Hamano
2021-04-30 14:23     ` [PATCH 04/23] fsmonitor: introduce `core.useBuiltinFSMonitor` to call the daemon via IPC Jeff Hostetler
2021-04-01 15:40 ` [PATCH 05/23] fsmonitor--daemon: add a built-in fsmonitor daemon Jeff Hostetler via GitGitGadget
2021-04-26 15:08   ` Derrick Stolee
2021-04-26 15:45     ` Derrick Stolee
2021-04-30 14:31       ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 06/23] fsmonitor--daemon: implement client command options Jeff Hostetler via GitGitGadget
2021-04-26 15:12   ` Derrick Stolee
2021-04-30 14:33     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 07/23] fsmonitor-fs-listen-win32: stub in backend for Windows Jeff Hostetler via GitGitGadget
2021-04-26 15:23   ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 08/23] fsmonitor-fs-listen-macos: stub in backend for MacOS Jeff Hostetler via GitGitGadget
2021-04-01 15:40 ` [PATCH 09/23] fsmonitor--daemon: implement daemon command options Jeff Hostetler via GitGitGadget
2021-04-26 15:47   ` Derrick Stolee
2021-04-26 16:12     ` Derrick Stolee
2021-04-30 15:18       ` Jeff Hostetler
2021-04-30 15:59     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 10/23] fsmonitor--daemon: add pathname classification Jeff Hostetler via GitGitGadget
2021-04-26 19:17   ` Derrick Stolee
2021-04-26 20:11     ` Eric Sunshine
2021-04-26 20:24       ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 11/23] fsmonitor--daemon: define token-ids Jeff Hostetler via GitGitGadget
2021-04-26 19:49   ` Derrick Stolee
2021-04-26 20:01     ` Eric Sunshine
2021-04-26 20:03       ` Derrick Stolee
2021-04-30 16:17     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 12/23] fsmonitor--daemon: create token-based changed path cache Jeff Hostetler via GitGitGadget
2021-04-26 20:22   ` Derrick Stolee
2021-04-30 17:36     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 13/23] fsmonitor-fs-listen-win32: implement FSMonitor backend on Windows Jeff Hostetler via GitGitGadget
2021-04-27 17:22   ` Derrick Stolee
2021-04-27 17:41     ` Eric Sunshine
2021-04-30 19:32     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 14/23] fsmonitor-fs-listen-macos: add macos header files for FSEvent Jeff Hostetler via GitGitGadget
2021-04-27 18:13   ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 15/23] fsmonitor-fs-listen-macos: implement FSEvent listener on MacOS Jeff Hostetler via GitGitGadget
2021-04-27 18:35   ` Derrick Stolee
2021-04-30 20:05     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 16/23] fsmonitor--daemon: implement handle_client callback Jeff Hostetler via GitGitGadget
2021-04-26 21:01   ` Derrick Stolee
2021-05-03 15:04     ` Jeff Hostetler
2021-05-13 18:52   ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 17/23] fsmonitor--daemon: periodically truncate list of modified files Jeff Hostetler via GitGitGadget
2021-04-27 13:24   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 18/23] fsmonitor--daemon:: introduce client delay for testing Jeff Hostetler via GitGitGadget
2021-04-27 13:36   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 19/23] fsmonitor--daemon: use a cookie file to sync with file system Jeff Hostetler via GitGitGadget
2021-04-27 14:23   ` Derrick Stolee [this message]
2021-05-03 21:59     ` Jeff Hostetler
2021-04-01 15:41 ` [PATCH 20/23] fsmonitor: force update index when fsmonitor token advances Jeff Hostetler via GitGitGadget
2021-04-27 14:52   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 21/23] t7527: create test for fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-27 15:41   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 22/23] p7519: add fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-27 15:45   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 23/23] t7527: test status with untracked-cache and fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-27 15:51   ` Derrick Stolee
2021-04-16 22:44 ` [PATCH 00/23] [RFC] Builtin FSMonitor Feature Junio C Hamano
2021-04-20 15:27   ` Johannes Schindelin
2021-04-20 19:13     ` Jeff Hostetler
2021-04-21 13:17     ` Derrick Stolee
2021-04-27 18:49 ` FS Monitor Windows Performance (was [PATCH 00/23] [RFC] Builtin FSMonitor Feature) Derrick Stolee
2021-04-27 19:31 ` FS Monitor macOS " Derrick Stolee
2021-05-22 13:56 ` [PATCH v2 00/28] Builtin FSMonitor Feature Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 01/28] simple-ipc: preparations for supporting binary messages Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 02/28] fsmonitor--daemon: man page Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 03/28] fsmonitor--daemon: update fsmonitor documentation Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 04/28] fsmonitor-ipc: create client routines for git-fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-06-02 11:24     ` Johannes Schindelin
2021-05-22 13:56   ` [PATCH v2 05/28] help: include fsmonitor--daemon feature flag in version info Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 06/28] config: FSMonitor is repository-specific Johannes Schindelin via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 07/28] fsmonitor: introduce `core.useBuiltinFSMonitor` to call the daemon via IPC Johannes Schindelin via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 08/28] fsmonitor--daemon: add a built-in fsmonitor daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 09/28] fsmonitor--daemon: implement client command options Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 10/28] t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon Jeff Hostetler via GitGitGadget
2021-06-11  6:32     ` Junio C Hamano
2021-05-22 13:56   ` [PATCH v2 11/28] fsmonitor-fs-listen-win32: stub in backend for Windows Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 12/28] fsmonitor-fs-listen-macos: stub in backend for MacOS Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 13/28] fsmonitor--daemon: implement daemon command options Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 14/28] fsmonitor--daemon: add pathname classification Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 15/28] fsmonitor--daemon: define token-ids Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 16/28] fsmonitor--daemon: create token-based changed path cache Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 17/28] fsmonitor-fs-listen-win32: implement FSMonitor backend on Windows Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 18/28] fsmonitor-fs-listen-macos: add macos header files for FSEvent Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 19/28] fsmonitor-fs-listen-macos: implement FSEvent listener on MacOS Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 20/28] fsmonitor--daemon: implement handle_client callback Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 21/28] fsmonitor--daemon: periodically truncate list of modified files Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 22/28] fsmonitor--daemon: use a cookie file to sync with file system Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 23/28] fsmonitor: enhance existing comments Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 24/28] fsmonitor: force update index after large responses Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 25/28] t7527: create test for fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 26/28] p7519: add fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 27/28] t7527: test status with untracked-cache and fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 28/28] t/perf: avoid copying builtin fsmonitor files into test repo Jeff Hostetler via GitGitGadget
2021-05-27  2:06   ` [PATCH v2 00/28] Builtin FSMonitor Feature Junio C Hamano
2021-06-02 11:28     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5669d313-4084-dceb-1204-acd005e58657@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jeffhost@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git