Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff Hostetler via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH 12/23] fsmonitor--daemon: create token-based changed path cache
Date: Mon, 26 Apr 2021 16:22:45 -0400
Message-ID: <df33c12b-ada3-05dc-3d17-6cc9d205b4cc@gmail.com> (raw)
In-Reply-To: <f1fa803ebe9c9f78608c22e55ec590f8c6775c94.1617291666.git.gitgitgadget@gmail.com>

On 4/1/2021 11:40 AM, Jeff Hostetler via GitGitGadget wrote:
> From: Jeff Hostetler <jeffhost@microsoft.com>
> 
> Teach fsmonitor--daemon to build lists of changed paths and associate
> them with a token-id.  This will be used by the platform-specific
> backends to accumulate changed paths in response to filesystem events.
> 
> The platform-specific event loops receive batches containing one or
> more changed paths.  Their fs listener thread will accumulate them in

I think the lowercase "fs" here is strange. "Their listener thread"
could be interpreted as the IPC listener, so it's probably best to
spell it out: "Their filesystem listener thread".

> a `fsmonitor_batch` (and without locking) and then "publish" them to
> associate them with the current token and to make them visible to the
> client worker threads.
...
> +struct fsmonitor_batch {
> +	struct fsmonitor_batch *next;
> +	uint64_t batch_seq_nr;
> +	const char **interned_paths;
> +	size_t nr, alloc;
> +	time_t pinned_time;
> +};

A linked list to help with adding while consuming it, but also
batching for efficiency. I can see how this will work out
nicely.

> +struct fsmonitor_batch *fsmonitor_batch__new(void)
> +{
> +	struct fsmonitor_batch *batch = xcalloc(1, sizeof(*batch));

I mentioned earlier that I think `CALLOC_ARRAY(batch, 1)` is the
typical pattern here.

> +
> +	return batch;
> +}
> +
> +struct fsmonitor_batch *fsmonitor_batch__free(struct fsmonitor_batch *batch)

Since this method frees the tip of the list and returns the next
item (instead of freeing the entire list), perhaps this would be
better named as _pop()?

> +{
> +	struct fsmonitor_batch *next;
> +
> +	if (!batch)
> +		return NULL;
> +
> +	next = batch->next;
> +
> +	/*
> +	 * The actual strings within the array are interned, so we don't
> +	 * own them.
> +	 */
> +	free(batch->interned_paths);
> +
> +	return next;
> +}
> +
> +void fsmonitor_batch__add_path(struct fsmonitor_batch *batch,
> +			       const char *path)
> +{
> +	const char *interned_path = strintern(path);

This use of interned paths is interesting, although I become
concerned for the amount of memory we are consuming over the
lifetime of the process. This could be considered as a target
for future improvements, perhaps with an LRU cache or something
similar.

> +
> +	trace_printf_key(&trace_fsmonitor, "event: %s", interned_path);
> +
> +	ALLOC_GROW(batch->interned_paths, batch->nr + 1, batch->alloc);
> +	batch->interned_paths[batch->nr++] = interned_path;
> +}
> +
> +static void fsmonitor_batch__combine(struct fsmonitor_batch *batch_dest,
> +				     const struct fsmonitor_batch *batch_src)
> +{
> +	/* assert state->main_lock */
> +

This comment seems stale.

> +	size_t k;
> +
> +	ALLOC_GROW(batch_dest->interned_paths,
> +		   batch_dest->nr + batch_src->nr + 1,
> +		   batch_dest->alloc);
> +
> +	for (k = 0; k < batch_src->nr; k++)
> +		batch_dest->interned_paths[batch_dest->nr++] =
> +			batch_src->interned_paths[k];
> +}
> +
> +static void fsmonitor_free_token_data(struct fsmonitor_token_data *token)

This one _does_ free the whole list.

> +{
> +	struct fsmonitor_batch *p;
> +
> +	if (!token)
> +		return;
> +
> +	assert(token->client_ref_count == 0);
> +
> +	strbuf_release(&token->token_id);
> +
> +	for (p = token->batch_head; p; p = fsmonitor_batch__free(p))
> +		;
> +
> +	free(token);
> +}
> +
> +/*
> + * Flush all of our cached data about the filesystem.  Call this if we
> + * lose sync with the filesystem and miss some notification events.
> + *
> + * [1] If we are missing events, then we no longer have a complete
> + *     history of the directory (relative to our current start token).
> + *     We should create a new token and start fresh (as if we just
> + *     booted up).
> + *
> + * If there are no readers of the the current token data series, we
> + * can free it now.  Otherwise, let the last reader free it.  Either
> + * way, the old token data series is no longer associated with our
> + * state data.
> + */
> +void fsmonitor_force_resync(struct fsmonitor_daemon_state *state)
> +{
> +	struct fsmonitor_token_data *free_me = NULL;
> +	struct fsmonitor_token_data *new_one = NULL;
> +
> +	new_one = fsmonitor_new_token_data();
> +
> +	pthread_mutex_lock(&state->main_lock);
> +
> +	trace_printf_key(&trace_fsmonitor,
> +			 "force resync [old '%s'][new '%s']",
> +			 state->current_token_data->token_id.buf,
> +			 new_one->token_id.buf);
> +
> +	if (state->current_token_data->client_ref_count == 0)
> +		free_me = state->current_token_data;
> +	state->current_token_data = new_one;
> +
> +	pthread_mutex_unlock(&state->main_lock);
> +
> +	fsmonitor_free_token_data(free_me);
> +}
> +

Swap the pointer under a lock, free outside of it. Good.

> +/*
> + * We try to combine small batches at the front of the batch-list to avoid
> + * having a long list.  This hopefully makes it a little easier when we want
> + * to truncate and maintain the list.  However, we don't want the paths array
> + * to just keep growing and growing with realloc, so we insert an arbitrary
> + * limit.
> + */
> +#define MY_COMBINE_LIMIT (1024)
> +
> +void fsmonitor_publish(struct fsmonitor_daemon_state *state,
> +		       struct fsmonitor_batch *batch,
> +		       const struct string_list *cookie_names)
> +{
> +	if (!batch && !cookie_names->nr)
> +		return;
> +
> +	pthread_mutex_lock(&state->main_lock);
> +
> +	if (batch) {
> +		struct fsmonitor_batch *head;
> +
> +		head = state->current_token_data->batch_head;
> +		if (!head) {
> +			batch->batch_seq_nr = 0;
> +			batch->next = NULL;
> +			state->current_token_data->batch_head = batch;
> +			state->current_token_data->batch_tail = batch;
> +		} else if (head->pinned_time) {
> +			/*
> +			 * We cannot alter the current batch list
> +			 * because:
> +			 *
> +			 * [a] it is being transmitted to at least one
> +			 * client and the handle_client() thread has a
> +			 * ref-count, but not a lock on the batch list
> +			 * starting with this item.
> +			 *
> +			 * [b] it has been transmitted in the past to
> +			 * at least one client such that future
> +			 * requests are relative to this head batch.
> +			 *
> +			 * So, we can only prepend a new batch onto
> +			 * the front of the list.
> +			 */
> +			batch->batch_seq_nr = head->batch_seq_nr + 1;
> +			batch->next = head;
> +			state->current_token_data->batch_head = batch;
> +		} else if (head->nr + batch->nr > MY_COMBINE_LIMIT) {
> +			/*
> +			 * The head batch in the list has never been
> +			 * transmitted to a client, but folding the
> +			 * contents of the new batch onto it would
> +			 * exceed our arbitrary limit, so just prepend
> +			 * the new batch onto the list.
> +			 */
> +			batch->batch_seq_nr = head->batch_seq_nr + 1;
> +			batch->next = head;
> +			state->current_token_data->batch_head = batch;
> +		} else {
> +			/*
> +			 * We are free to append the paths in the given
> +			 * batch onto the end of the current head batch.
> +			 */
> +			fsmonitor_batch__combine(head, batch);
> +			fsmonitor_batch__free(batch);
> +		}
> +	}
> +
> +	pthread_mutex_unlock(&state->main_lock);
> +}

I appreciate the careful comments in this critical piece of the
data structure. Also, it is good that you already have a batch
of results to merge into the list instead of updating a lock for
every filesystem event.

Thanks,
-Stolee

  reply index

Thread overview: 121+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 15:40 [PATCH 00/23] [RFC] Builtin FSMonitor Feature Jeff Hostetler via GitGitGadget
2021-04-01 15:40 ` [PATCH 01/23] fsmonitor--daemon: man page and documentation Jeff Hostetler via GitGitGadget
2021-04-26 14:13   ` Derrick Stolee
2021-04-28 13:54     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 02/23] fsmonitor-ipc: create client routines for git-fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-26 14:31   ` Derrick Stolee
2021-04-26 20:20     ` Eric Sunshine
2021-04-26 21:02       ` Derrick Stolee
2021-04-28 19:26     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 03/23] config: FSMonitor is repository-specific Johannes Schindelin via GitGitGadget
2021-04-01 15:40 ` [PATCH 04/23] fsmonitor: introduce `core.useBuiltinFSMonitor` to call the daemon via IPC Johannes Schindelin via GitGitGadget
2021-04-26 14:56   ` Derrick Stolee
2021-04-27  9:20     ` Ævar Arnfjörð Bjarmason
2021-04-27 12:42       ` Derrick Stolee
2021-04-28  7:59         ` Ævar Arnfjörð Bjarmason
2021-04-28 16:26           ` [PATCH] repo-settings.c: simplify the setup Ævar Arnfjörð Bjarmason
2021-04-28 19:09             ` Nesting topics within other threads (was: [PATCH] repo-settings.c: simplify the setup) Derrick Stolee
2021-04-28 23:01               ` Ævar Arnfjörð Bjarmason
2021-05-05 16:12                 ` Johannes Schindelin
2021-04-29  5:12               ` Nesting topics within other threads Junio C Hamano
2021-04-29 12:14                 ` Ævar Arnfjörð Bjarmason
2021-04-29 20:14                   ` Jeff King
2021-04-30  0:07                   ` Junio C Hamano
2021-04-30 14:23     ` [PATCH 04/23] fsmonitor: introduce `core.useBuiltinFSMonitor` to call the daemon via IPC Jeff Hostetler
2021-04-01 15:40 ` [PATCH 05/23] fsmonitor--daemon: add a built-in fsmonitor daemon Jeff Hostetler via GitGitGadget
2021-04-26 15:08   ` Derrick Stolee
2021-04-26 15:45     ` Derrick Stolee
2021-04-30 14:31       ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 06/23] fsmonitor--daemon: implement client command options Jeff Hostetler via GitGitGadget
2021-04-26 15:12   ` Derrick Stolee
2021-04-30 14:33     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 07/23] fsmonitor-fs-listen-win32: stub in backend for Windows Jeff Hostetler via GitGitGadget
2021-04-26 15:23   ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 08/23] fsmonitor-fs-listen-macos: stub in backend for MacOS Jeff Hostetler via GitGitGadget
2021-04-01 15:40 ` [PATCH 09/23] fsmonitor--daemon: implement daemon command options Jeff Hostetler via GitGitGadget
2021-04-26 15:47   ` Derrick Stolee
2021-04-26 16:12     ` Derrick Stolee
2021-04-30 15:18       ` Jeff Hostetler
2021-04-30 15:59     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 10/23] fsmonitor--daemon: add pathname classification Jeff Hostetler via GitGitGadget
2021-04-26 19:17   ` Derrick Stolee
2021-04-26 20:11     ` Eric Sunshine
2021-04-26 20:24       ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 11/23] fsmonitor--daemon: define token-ids Jeff Hostetler via GitGitGadget
2021-04-26 19:49   ` Derrick Stolee
2021-04-26 20:01     ` Eric Sunshine
2021-04-26 20:03       ` Derrick Stolee
2021-04-30 16:17     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 12/23] fsmonitor--daemon: create token-based changed path cache Jeff Hostetler via GitGitGadget
2021-04-26 20:22   ` Derrick Stolee [this message]
2021-04-30 17:36     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 13/23] fsmonitor-fs-listen-win32: implement FSMonitor backend on Windows Jeff Hostetler via GitGitGadget
2021-04-27 17:22   ` Derrick Stolee
2021-04-27 17:41     ` Eric Sunshine
2021-04-30 19:32     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 14/23] fsmonitor-fs-listen-macos: add macos header files for FSEvent Jeff Hostetler via GitGitGadget
2021-04-27 18:13   ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 15/23] fsmonitor-fs-listen-macos: implement FSEvent listener on MacOS Jeff Hostetler via GitGitGadget
2021-04-27 18:35   ` Derrick Stolee
2021-04-30 20:05     ` Jeff Hostetler
2021-04-01 15:40 ` [PATCH 16/23] fsmonitor--daemon: implement handle_client callback Jeff Hostetler via GitGitGadget
2021-04-26 21:01   ` Derrick Stolee
2021-05-03 15:04     ` Jeff Hostetler
2021-05-13 18:52   ` Derrick Stolee
2021-04-01 15:40 ` [PATCH 17/23] fsmonitor--daemon: periodically truncate list of modified files Jeff Hostetler via GitGitGadget
2021-04-27 13:24   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 18/23] fsmonitor--daemon:: introduce client delay for testing Jeff Hostetler via GitGitGadget
2021-04-27 13:36   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 19/23] fsmonitor--daemon: use a cookie file to sync with file system Jeff Hostetler via GitGitGadget
2021-04-27 14:23   ` Derrick Stolee
2021-05-03 21:59     ` Jeff Hostetler
2021-04-01 15:41 ` [PATCH 20/23] fsmonitor: force update index when fsmonitor token advances Jeff Hostetler via GitGitGadget
2021-04-27 14:52   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 21/23] t7527: create test for fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-27 15:41   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 22/23] p7519: add fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-27 15:45   ` Derrick Stolee
2021-04-01 15:41 ` [PATCH 23/23] t7527: test status with untracked-cache and fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-04-27 15:51   ` Derrick Stolee
2021-04-16 22:44 ` [PATCH 00/23] [RFC] Builtin FSMonitor Feature Junio C Hamano
2021-04-20 15:27   ` Johannes Schindelin
2021-04-20 19:13     ` Jeff Hostetler
2021-04-21 13:17     ` Derrick Stolee
2021-04-27 18:49 ` FS Monitor Windows Performance (was [PATCH 00/23] [RFC] Builtin FSMonitor Feature) Derrick Stolee
2021-04-27 19:31 ` FS Monitor macOS " Derrick Stolee
2021-05-22 13:56 ` [PATCH v2 00/28] Builtin FSMonitor Feature Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 01/28] simple-ipc: preparations for supporting binary messages Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 02/28] fsmonitor--daemon: man page Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 03/28] fsmonitor--daemon: update fsmonitor documentation Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 04/28] fsmonitor-ipc: create client routines for git-fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-06-02 11:24     ` Johannes Schindelin
2021-06-14 21:23       ` Johannes Schindelin
2021-05-22 13:56   ` [PATCH v2 05/28] help: include fsmonitor--daemon feature flag in version info Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 06/28] config: FSMonitor is repository-specific Johannes Schindelin via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 07/28] fsmonitor: introduce `core.useBuiltinFSMonitor` to call the daemon via IPC Johannes Schindelin via GitGitGadget
2021-06-14 21:28     ` Johannes Schindelin
2021-05-22 13:56   ` [PATCH v2 08/28] fsmonitor--daemon: add a built-in fsmonitor daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 09/28] fsmonitor--daemon: implement client command options Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 10/28] t/helper/fsmonitor-client: create IPC client to talk to FSMonitor Daemon Jeff Hostetler via GitGitGadget
2021-06-11  6:32     ` Junio C Hamano
2021-05-22 13:56   ` [PATCH v2 11/28] fsmonitor-fs-listen-win32: stub in backend for Windows Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 12/28] fsmonitor-fs-listen-macos: stub in backend for MacOS Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 13/28] fsmonitor--daemon: implement daemon command options Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 14/28] fsmonitor--daemon: add pathname classification Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 15/28] fsmonitor--daemon: define token-ids Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 16/28] fsmonitor--daemon: create token-based changed path cache Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 17/28] fsmonitor-fs-listen-win32: implement FSMonitor backend on Windows Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 18/28] fsmonitor-fs-listen-macos: add macos header files for FSEvent Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 19/28] fsmonitor-fs-listen-macos: implement FSEvent listener on MacOS Jeff Hostetler via GitGitGadget
2021-05-22 13:56   ` [PATCH v2 20/28] fsmonitor--daemon: implement handle_client callback Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 21/28] fsmonitor--daemon: periodically truncate list of modified files Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 22/28] fsmonitor--daemon: use a cookie file to sync with file system Jeff Hostetler via GitGitGadget
2021-06-14 21:42     ` Johannes Schindelin
2021-05-22 13:57   ` [PATCH v2 23/28] fsmonitor: enhance existing comments Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 24/28] fsmonitor: force update index after large responses Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 25/28] t7527: create test for fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 26/28] p7519: add fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 27/28] t7527: test status with untracked-cache and fsmonitor--daemon Jeff Hostetler via GitGitGadget
2021-05-22 13:57   ` [PATCH v2 28/28] t/perf: avoid copying builtin fsmonitor files into test repo Jeff Hostetler via GitGitGadget
2021-05-27  2:06   ` [PATCH v2 00/28] Builtin FSMonitor Feature Junio C Hamano
2021-06-02 11:28     ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df33c12b-ada3-05dc-3d17-6cc9d205b4cc@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jeffhost@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git