git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Calvin Wan <calvinwan@google.com>
To: Git Mailing List <git@vger.kernel.org>
Subject: Parallelism defaults and config options
Date: Mon, 24 Oct 2022 16:08:24 -0700	[thread overview]
Message-ID: <CAFySSZAbsPuyPVX0+DQzArny2CEWs+GpQqJ3AOxUB_ffo8B3SQ@mail.gmail.com> (raw)

While trying to figure out how I was going to set the default
parallelism in cw/submodule-status-in-parallel, I noticed some
discrepancies between how all the parallelism config options are set in
git. I wanted to discuss what we can do now to make them more consistent
and also what the standard should be for the future. Here is a list of
parallelism config options in git (let me know if I missed any) and how
they're set:

grep.threads: if unset or set to 0, git uses number of logical cores.
index.threads: if unset or set to 0/true, git uses number of logical
  cores. If set to 1/false, multithreading is disabled
pack.threads: if unset or set to 0, git uses number of logical cores.
  (documentation doesn't mention what default is)
checkout.workers: if unset, defaults to 1. If set to < 1, git uses
  number of logical cores
fetch.parallel: if unset, defaults to 1. If set to 0, git uses number of
  logical cores (documentation says reasonable default)
http.maxRequests: if unset, defaults to 5. If set to < 1, git uses the
  default 5.
submodule.fetchJobs: if unset, defaults to 1. If set to 0, git uses
  number of logical cores (documentation says reasonable default)

The first inconsistency is the difference in language used to describe
when each option is set to "online_cpus()". Some are explicit while
others omit it or use language such as "reasonable default". Being
explicit for all of the options is probably the easiest documentation
fix.

The next inconsistency is for values < 1. Most options use online_cpus()
when set to 0 except index.threads which is a special case of its own.
Some options error out when set to a negative number while
checkout.workers falls back to online_cpus() and http.maxRequests falls
back to 5. I don't think we can fix this retroactively unless we decide
that all config options will be set to online_cpus() if the value is
negative. Should that be the case going forward or should 0 be the only
special cased value for now? I can see an argument for allowing other
negative values to be configured in the future for different defaulting
options.

The final inconsistency is how values are defaulted if unset. Some
default to online_cpus() while others default to 1 (http.maxRequests is
5). I want to call out grep.threads specifically here -- on my machine
with 48 cores, the default is actually SLOWER than using 1 thread. This
is because the grep operation is heavily IO bound, so creating too many
threads adds overhead every time the read head changes. Interestingly,
this option runs optimally at 4 since that's how many PCIe lanes my SSD
uses. While it makes sense to default processor heavy operations to
online_cpus(), does it make sense to do the same for IO heavy
operations? (I wasn't able to find an equivalent of online_cpus() for
drive reading capabilities.) And what about operations that have a fair
mix of each?

The safe option is to default to 1 process for many of these config
options, but we trade off in improving the experience for the average
user that is unaware of these options. If we're already defaulting to
online_cpus() for grep.threads and selecting 5 for http.maxRequests,
then why not do the same for other options? My suggestion would be
defaulting IO dominant operations to min(4, online_cpus()) since that
seems like the standard number of lanes for people using SSDs. I would
also default operations that have a mix of both to
min(8, online_cpus()).
-- 
Calvin Wan

             reply	other threads:[~2022-10-25  0:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-24 23:08 Calvin Wan [this message]
2022-10-25  9:48 ` Parallelism defaults and config options Ævar Arnfjörð Bjarmason
2022-10-25 18:01   ` Calvin Wan
2022-10-25 18:47     ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFySSZAbsPuyPVX0+DQzArny2CEWs+GpQqJ3AOxUB_ffo8B3SQ@mail.gmail.com \
    --to=calvinwan@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).