git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Brandon Williams <bmwill@google.com>
Cc: git@vger.kernel.org, sbeller@google.com, pclouds@gmail.com
Subject: Re: [PATCH v3 00/27] Revamp the attribute system; another round
Date: Thu, 02 Feb 2017 11:14:30 -0800	[thread overview]
Message-ID: <xmqqwpd8s8vd.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170128020207.179015-1-bmwill@google.com> (Brandon Williams's message of "Fri, 27 Jan 2017 18:01:40 -0800")

Brandon Williams <bmwill@google.com> writes:

> Per some of the discussion online and off I locally broke up up the question
> and answer and I wasn't very thrilled with the outcome for a number of reasons.
>
> 1. The API is more complex....
> 2. Performance hit....
> ...
> Given the above, v3 is a reroll of the same design as in v2.  This is a good
> milestone in improving the attribute system as it achieves the goal of making
> the attribute subsystem thread-safe (ie multiple callers can be executing
> inside the attribute system at the same time) and will enable a future series
> to allow pathspec code to call into the attribute system.
>
> Most of the changes in this revision are cosmetic (variable renames, code
> movement, etc) but there was a memory leak that was also fixed.

I am OK with the patches presented in this round, but let me explain
why I still expect that we would eventually end up spliting the
question & answer into separate data structure before we can truly
go multi-threaded.

A typical application would do

	for path taken from some set:
		do_something(path)

and "do something with path" would be another helper function, which
may do

	do_something(path):
		ask 'text' attribute for the path
		switch on the attribute and do different things

With the original API, the latter would statically allocate an array
of <question, answer> pairs, with an optimization to populate
<question> which is immutable (because the codepath is always and
only interested in 'text' attribute, and you need a hash lookup to
intern the string "text" which costs cycles) only once, and make a
call to git_check_attr() function with the "path".  This obviously
will not work when two threads are calling this helper function, as
the threads both want their git_check_attr() to return their answers
to the array, but the <answer> part are shared between the threads.

A naive and inefficient way to split questions and answers is to
have two arrays, allocating the former statically (protected under a
mutex, of course) to avoid repeated cost of interning, while
allocating the latter (and some working area per invocation, like
the check_all_attr[]) dynamically and place it on stack or heap.
Because do_something() will be called number of times, the cost for
allocation and initialization of the <answer> part that is paid per
invocation will of course become very high.

We could in theory keep the original arrangement of having an
array of <question, answer> pairs and restructure the code to do:

	prepare the <question, answer> array
	for path taken from some set:
		do_something(the array, path)

That way, do_something() do not have to keep allocating,
initializing and destroying the array.

But after looking at the current set of codepaths, before coming to
the conclusion that we need to split the static part that is
specific to the callsite for git_check_attr() and the dynamic part
that is specific to the <callsite, thread> pair, I noticed that
typically the callers that can prepare the array before going into
the loop (which will eventually be spread across multiple threads)
are many levels away in the callchain, and they are not even aware
of what attributes are going to be requested in the leaf level
helper functions.  In other words, the approach to hoist "the
<question, answer> array" up in the callchain would not scale.  A
caller that loops over paths in the index and check them out does
not want to know (and we do not want to tell it) what exact
attributes are involved in the decision convert_to_working_tree()
makes for each path, for example.

So how would we split questions and answers in a way that is not
naive and inefficient?  

I envision that we would allow the attribute subsystem to keep track
of the dynamic part, which will receive the answers, holds working
area like check_all_attr[], and has the equivalent to the "attr
stack", indexed by <thread-id, callsite> pair (and the
identification of "callsite" can be done by using the address of the
static part, i.e. the array of questions that we initialize just
once when interning the list of attribute names for the first time).

The API to prepare and ask for attributes may look like:

	static struct attr_static_part Q;
	struct attr_dynamic_part *D;

	attr_check_init(&Q, "text", ...);
	D = git_attr_check(&Q, path);

where Q contains an array of interned attributes (i.e. questions)
and other immutable things that is unique to this callsite, but can
be shared across multiple threads asking the same question from
here.  As an internal implementation detail, it probably will have a
mutex to make sure that init will run only once.

Then the implementation of git_attr_check(&Q, path) would be:

    - see if there is already the "dynaic part" allocated for the
      current thread asking the question Q.  If there is not,
      allocate one and remember it, so that it can be reused in
      later calls by the same thread; if there is, use that existing
      one.

    - reinitialize the "dynamic part" as needed, e.g. clear the
      equivalent to check_all_attr[], adjust the equivalent to
      attr_stack for the current path, etc.  Just like the current
      code optimizes for the case where the entire program (a single
      thread) will ask the same question for paths in traversal
      order (i.e. falling in the same directory), this will optimize
      for the access pattern where each thread asks the same
      question for paths in its traversal order.

    - do what the current collect_some_attrs() thing does.

And this hopefully won't be as costly as the naive and inefficient
one.

The reason why I was pushing hard to split the static part and the
dynamic part in our redesign of the API is primarily because I
didn't want to update the API callers twice.  But I'd imagine that
your v3 (and your earlier "do not discard attr stack, but keep them
around, holding their tips in a hashmap for quick reuse") would at
least lay the foundation for the eventual shape of the API, let's
bite the bullet and accept that we will need to update the callers
again anyway.

Thanks.


  parent reply	other threads:[~2017-02-02 19:14 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-12 23:53 [PATCH 00/27] Revamp the attribute system; another round Brandon Williams
2017-01-12 23:53 ` [PATCH 01/27] commit.c: use strchrnul() to scan for one line Brandon Williams
2017-01-12 23:53 ` [PATCH 02/27] attr.c: " Brandon Williams
2017-01-12 23:53 ` [PATCH 03/27] attr.c: update a stale comment on "struct match_attr" Brandon Williams
2017-01-12 23:53 ` [PATCH 04/27] attr.c: explain the lack of attr-name syntax check in parse_attr() Brandon Williams
2017-01-12 23:53 ` [PATCH 05/27] attr.c: complete a sentence in a comment Brandon Williams
2017-01-12 23:53 ` [PATCH 06/27] attr.c: mark where #if DEBUG ends more clearly Brandon Williams
2017-01-12 23:53 ` [PATCH 07/27] attr.c: simplify macroexpand_one() Brandon Williams
2017-01-12 23:53 ` [PATCH 08/27] attr.c: tighten constness around "git_attr" structure Brandon Williams
2017-01-12 23:53 ` [PATCH 09/27] attr.c: plug small leak in parse_attr_line() Brandon Williams
2017-01-12 23:53 ` [PATCH 10/27] attr: support quoting pathname patterns in C style Brandon Williams
2017-01-12 23:53 ` [PATCH 11/27] attr.c: add push_stack() helper Brandon Williams
2017-01-12 23:53 ` [PATCH 12/27] Documentation: fix a typo Brandon Williams
2017-01-12 23:53 ` [PATCH 13/27] attr.c: outline the future plans by heavily commenting Brandon Williams
2017-01-12 23:53 ` [PATCH 14/27] attr: rename function and struct related to checking attributes Brandon Williams
2017-01-12 23:53 ` [PATCH 15/27] attr: (re)introduce git_check_attr() and struct attr_check Brandon Williams
2017-01-12 23:53 ` [PATCH 16/27] attr: convert git_all_attrs() to use "struct attr_check" Brandon Williams
2017-01-12 23:53 ` [PATCH 17/27] attr: convert git_check_attrs() callers to use the new API Brandon Williams
2017-01-12 23:53 ` [PATCH 18/27] attr: retire git_check_attrs() API Brandon Williams
2017-01-12 23:53 ` [PATCH 19/27] attr: pass struct attr_check to collect_some_attrs Brandon Williams
2017-01-12 23:53 ` [PATCH 20/27] attr: change validity check for attribute names to use positive logic Brandon Williams
2017-01-12 23:53 ` [PATCH 21/27] attr: use hashmap for attribute dictionary Brandon Williams
2017-01-18 20:20   ` Stefan Beller
2017-01-18 20:23     ` Brandon Williams
2017-01-12 23:53 ` [PATCH 22/27] attr: eliminate global check_all_attr array Brandon Williams
2017-01-12 23:53 ` [PATCH 23/27] attr: remove maybe-real, maybe-macro from git_attr Brandon Williams
2017-01-12 23:53 ` [PATCH 24/27] attr: tighten const correctness with git_attr and match_attr Brandon Williams
2017-01-12 23:53 ` [PATCH 25/27] attr: store attribute stacks in hashmap Brandon Williams
2017-01-13 21:20   ` Junio C Hamano
2017-01-18 20:34     ` Brandon Williams
2017-01-23 18:08       ` Brandon Williams
2017-01-18 20:39   ` Stefan Beller
2017-01-18 20:45     ` Stefan Beller
2017-01-18 20:50     ` Brandon Williams
2017-01-12 23:53 ` [PATCH 26/27] attr: push the bare repo check into read_attr() Brandon Williams
2017-01-12 23:53 ` [PATCH 27/27] attr: reformat git_attr_set_direction() function Brandon Williams
2017-01-15 23:47 ` [PATCH 00/27] Revamp the attribute system; another round Junio C Hamano
2017-01-16  8:10   ` Jeff King
2017-01-23 20:34 ` [PATCH v2 " Brandon Williams
2017-01-23 20:34   ` [PATCH v2 01/27] commit.c: use strchrnul() to scan for one line Brandon Williams
2017-01-23 20:35   ` [PATCH v2 02/27] attr.c: " Brandon Williams
2017-01-23 20:35   ` [PATCH v2 03/27] attr.c: update a stale comment on "struct match_attr" Brandon Williams
2017-01-23 20:35   ` [PATCH v2 04/27] attr.c: explain the lack of attr-name syntax check in parse_attr() Brandon Williams
2017-01-23 20:35   ` [PATCH v2 05/27] attr.c: complete a sentence in a comment Brandon Williams
2017-01-23 20:35   ` [PATCH v2 06/27] attr.c: mark where #if DEBUG ends more clearly Brandon Williams
2017-01-23 20:35   ` [PATCH v2 07/27] attr.c: simplify macroexpand_one() Brandon Williams
2017-01-23 20:35   ` [PATCH v2 08/27] attr.c: tighten constness around "git_attr" structure Brandon Williams
2017-01-23 20:35   ` [PATCH v2 09/27] attr.c: plug small leak in parse_attr_line() Brandon Williams
2017-01-23 20:35   ` [PATCH v2 10/27] attr: support quoting pathname patterns in C style Brandon Williams
2017-01-23 20:35   ` [PATCH v2 11/27] attr.c: add push_stack() helper Brandon Williams
2017-01-23 20:35   ` [PATCH v2 12/27] Documentation: fix a typo Brandon Williams
2017-01-23 20:35   ` [PATCH v2 13/27] attr.c: outline the future plans by heavily commenting Brandon Williams
2017-01-23 20:35   ` [PATCH v2 14/27] attr: rename function and struct related to checking attributes Brandon Williams
2017-01-23 20:35   ` [PATCH v2 15/27] attr: (re)introduce git_check_attr() and struct attr_check Brandon Williams
2017-01-23 20:35   ` [PATCH v2 16/27] attr: convert git_all_attrs() to use "struct attr_check" Brandon Williams
2017-01-23 20:35   ` [PATCH v2 17/27] attr: convert git_check_attrs() callers to use the new API Brandon Williams
2017-01-23 20:35   ` [PATCH v2 18/27] attr: retire git_check_attrs() API Brandon Williams
2017-01-23 20:35   ` [PATCH v2 19/27] attr: pass struct attr_check to collect_some_attrs Brandon Williams
2017-01-23 20:35   ` [PATCH v2 20/27] attr: change validity check for attribute names to use positive logic Brandon Williams
2017-01-23 20:35   ` [PATCH v2 21/27] attr: use hashmap for attribute dictionary Brandon Williams
2017-01-23 20:35   ` [PATCH v2 22/27] attr: eliminate global check_all_attr array Brandon Williams
2017-01-23 21:11     ` Junio C Hamano
2017-01-23 20:35   ` [PATCH v2 23/27] attr: remove maybe-real, maybe-macro from git_attr Brandon Williams
2017-01-23 20:35   ` [PATCH v2 24/27] attr: tighten const correctness with git_attr and match_attr Brandon Williams
2017-01-23 20:35   ` [PATCH v2 25/27] attr: store attribute stack in attr_check structure Brandon Williams
2017-01-23 21:42     ` Junio C Hamano
2017-01-23 22:06       ` Brandon Williams
2017-01-24  1:11         ` Brandon Williams
2017-01-24  2:28           ` Junio C Hamano
2017-01-25 19:57             ` Brandon Williams
2017-01-25 20:10               ` Stefan Beller
2017-01-25 20:14               ` Junio C Hamano
2017-01-25 21:54                 ` Brandon Williams
2017-01-25 23:19                   ` Brandon Williams
2017-01-23 20:35   ` [PATCH v2 26/27] attr: push the bare repo check into read_attr() Brandon Williams
2017-01-23 20:35   ` [PATCH v2 27/27] attr: reformat git_attr_set_direction() function Brandon Williams
2017-01-28  2:01   ` [PATCH v3 00/27] Revamp the attribute system; another round Brandon Williams
2017-01-28  2:01     ` [PATCH v3 01/27] commit.c: use strchrnul() to scan for one line Brandon Williams
2017-01-28  2:01     ` [PATCH v3 02/27] attr.c: " Brandon Williams
2017-01-28  2:01     ` [PATCH v3 03/27] attr.c: update a stale comment on "struct match_attr" Brandon Williams
2017-01-28  2:01     ` [PATCH v3 04/27] attr.c: explain the lack of attr-name syntax check in parse_attr() Brandon Williams
2017-01-28  2:01     ` [PATCH v3 05/27] attr.c: complete a sentence in a comment Brandon Williams
2017-01-28  2:01     ` [PATCH v3 06/27] attr.c: mark where #if DEBUG ends more clearly Brandon Williams
2017-01-28  2:01     ` [PATCH v3 07/27] attr.c: simplify macroexpand_one() Brandon Williams
2017-01-28  2:01     ` [PATCH v3 08/27] attr.c: tighten constness around "git_attr" structure Brandon Williams
2017-01-28  2:01     ` [PATCH v3 09/27] attr.c: plug small leak in parse_attr_line() Brandon Williams
2017-01-28  2:01     ` [PATCH v3 10/27] attr: support quoting pathname patterns in C style Brandon Williams
2017-01-28  2:01     ` [PATCH v3 11/27] attr.c: add push_stack() helper Brandon Williams
2017-01-28  2:01     ` [PATCH v3 12/27] Documentation: fix a typo Brandon Williams
2017-01-28  2:01     ` [PATCH v3 13/27] attr.c: outline the future plans by heavily commenting Brandon Williams
2017-01-28  2:01     ` [PATCH v3 14/27] attr: rename function and struct related to checking attributes Brandon Williams
2017-01-28  2:01     ` [PATCH v3 15/27] attr: (re)introduce git_check_attr() and struct attr_check Brandon Williams
2017-01-30 18:05       ` Brandon Williams
2017-01-28  2:01     ` [PATCH v3 16/27] attr: convert git_all_attrs() to use "struct attr_check" Brandon Williams
2017-01-28 23:50       ` Stefan Beller
2017-01-29  2:44         ` Brandon Williams
2017-01-30 18:06       ` Brandon Williams
2017-01-28  2:01     ` [PATCH v3 17/27] attr: convert git_check_attrs() callers to use the new API Brandon Williams
2017-01-28  2:01     ` [PATCH v3 18/27] attr: retire git_check_attrs() API Brandon Williams
2017-01-28  2:01     ` [PATCH v3 19/27] attr: pass struct attr_check to collect_some_attrs Brandon Williams
2017-01-28  2:02     ` [PATCH v3 20/27] attr: change validity check for attribute names to use positive logic Brandon Williams
2017-01-28  2:02     ` [PATCH v3 21/27] attr: use hashmap for attribute dictionary Brandon Williams
2017-01-28  2:02     ` [PATCH v3 22/27] attr: eliminate global check_all_attr array Brandon Williams
2017-01-28  2:02     ` [PATCH v3 23/27] attr: remove maybe-real, maybe-macro from git_attr Brandon Williams
2017-01-28  2:02     ` [PATCH v3 24/27] attr: tighten const correctness with git_attr and match_attr Brandon Williams
2017-01-28  2:02     ` [PATCH v3 25/27] attr: store attribute stack in attr_check structure Brandon Williams
2017-01-28  2:02     ` [PATCH v3 26/27] attr: push the bare repo check into read_attr() Brandon Williams
2017-01-28  2:02     ` [PATCH v3 27/27] attr: reformat git_attr_set_direction() function Brandon Williams
2017-02-02 19:14     ` Junio C Hamano [this message]
2017-02-09 17:18       ` [PATCH v3 00/27] Revamp the attribute system; another round Brandon Williams
2017-02-09 19:31         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqwpd8s8vd.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).