All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Jeffrey Walton" <noloader@gmail.com>,
	"Michał Kiedrowicz" <michal.kiedrowicz@gmail.com>,
	"J Smith" <dark.panda@gmail.com>,
	"Victor Leschuk" <vleschuk@gmail.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Thomas Rast" <trast@student.ethz.ch>,
	"Fredrik Kuivinen" <frekui@gmail.com>
Subject: Re: [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0
Date: Fri, 14 Apr 2017 17:23:25 -0400	[thread overview]
Message-ID: <20170414212325.fefrl3qdjigwyitd@sigill.intra.peff.net> (raw)
In-Reply-To: <CACBZZX7Xi2OWqHQd7jTGBEZyqcWk59oXbPJOjuYrYAFzd5huCA@mail.gmail.com>

On Tue, Apr 11, 2017 at 10:56:01PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Right, my suggestion was to teach "grep" to treat --threads=1 as "do not
> > spawn any other threads". I.e., to make it like the "0" case you were
> > proposing, and then leave "0" as "auto-detect". There would be no way to
> > spawn a _single_ thread and feed it. But why would you want to do that?
> > It's always going to be strictly worse than not threading at all.
> 
> I understand, but given the two profiles we've posted it seems clear
> that there's cases where if we did that, we'd be locking people out of
> their optimal thread configuration, which would be --thread=1 with my
> patch, but wouldn't exist with this proposed change.

Maybe I don't understand your profiles. For a single-core machine you
probably want fewer threads, right? There is no such thing as "0"
threads, as you always have the original main thread in which we would
do the work.  So the lowest you can go is "1" (it's a separate question
of what --threads=0 should "mean"; I think we should keep it as
"auto-detect" for compatibility).

We could implement the single-thread case by spawning off one worker
thread (and effectively having 2 threads, but one is just sitting in
pthread_join()). And I think that's how it's implemented now in
git-grep. But we can optimize out the creation of the second thread
entirely, and just do the work in the main thread.

That saves a little bit of thread-spawning overhead, and it also makes
debugging much more pleasant.  For --threads=2, you'd always have to
kick in the thread-spawning code, and you'd spawn two worker threads
(and the main thread just sits there).

IOW, I am just proposing something like this:

diff --git a/builtin/grep.c b/builtin/grep.c
index 65070c52f..76ce38404 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -326,7 +326,7 @@ static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	}
 
 #ifndef NO_PTHREADS
-	if (num_threads) {
+	if (num_threads > 1) {
 		add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, oid);
 		strbuf_release(&pathbuf);
 		return 0;
@@ -360,7 +360,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 	}
 
 #ifndef NO_PTHREADS
-	if (num_threads) {
+	if (num_threads > 1) {
 		add_work(opt, GREP_SOURCE_FILE, buf.buf, filename, filename);
 		strbuf_release(&buf);
 		return 0;

where we fall back to the same code that NO_PTHREADS uses when there is
only a single thread requested.

> Anyway, I really don't care about this feature much, I just wanted a
> way to disable threading, but looking at the perf profiles I wonder if
> doing your proposed change would cause a regression in some cases
> where someone really wanted /one/ thread.

I'm not sure what would regress. Right now --threads=1 only does
work in a single worker thread. And --threads=2 does it in 2, and so on.
In all cases, the original main-thread is just farming out work and
waiting on pthread_join() (let's call that the controller thread).  So
why would you ever want the "controller plus one worker" setup? It's
strictly worse than "controller just does the work".

> But of course my patch breaks the long documented grep.threads=0 for
> "give me threads that you auto detect" to now mean "you get none".

Right, that's what I'm concerned about.

> Also doesn't --thread=1 right now mean "one thread, but two workers?".
> I haven't dug into the grep worker/thread code, but it compiles the
> the pattern twice, so isn't both the non-thread main process & the
> sole thread it spawns on --thread=1 doing work, so in some other
> universe it's synonymous with --workers=2?

I think --threads=1 right now means "one worker thread". I think the
main program calls compile_grep_patterns() to make sure they are sane,
before it even considers whether and how to thread.  And then each
worker thread duplicates them and re-compiles them itself (IIRC, this is
because the regex code may not be thread-safe).

> If so do pack-objects & index-pack also behave like that? If so this
> whole thing is very confusing for users, because some will read 1
> thread and think "one worker", whereas it really means "two workers,
> one using a thread, if you want three workers spawn two threads".

No, I think --threads is "this many workers". If you have more than one
worker, you may have an extra thread farming out work to them, but that
isn't counted (and is mostly dormant).

-Peff

  reply	other threads:[~2017-04-14 21:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-08 13:24 [PATCH 00/12] PCREv2 & more Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 01/12] grep: add ability to disable threading with --threads=0 or grep.threads=0 Ævar Arnfjörð Bjarmason
2017-04-11 10:06   ` Jeff King
2017-04-11 20:20     ` Ævar Arnfjörð Bjarmason
2017-04-11 20:34       ` Jeff King
2017-04-11 20:56         ` Ævar Arnfjörð Bjarmason
2017-04-14 21:23           ` Jeff King [this message]
2017-04-16 22:25             ` Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 02/12] grep: remove redundant regflags assignment under PCRE Ævar Arnfjörð Bjarmason
2017-04-11 10:10   ` Jeff King
2017-04-08 13:24 ` [PATCH 03/12] Makefile & configure: reword outdated comment about PCRE Ævar Arnfjörð Bjarmason
2017-04-11 10:14   ` Jeff King
2017-04-15 12:10     ` Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 04/12] grep: add a test for backreferences in PCRE patterns Ævar Arnfjörð Bjarmason
2017-04-08 13:24 ` [PATCH 05/12] log: add exhaustive tests for pattern style options & config Ævar Arnfjörð Bjarmason
2017-04-11 10:23   ` Jeff King
2017-04-11 10:32     ` Ævar Arnfjörð Bjarmason
2017-04-11 10:51       ` Jeff King
2017-04-08 13:25 ` [PATCH 06/12] log: add -P as a synonym for --perl-regexp Ævar Arnfjörð Bjarmason
2017-04-10  2:39   ` Junio C Hamano
2017-04-11 10:26   ` Jeff King
2017-04-08 13:25 ` [PATCH 07/12] grep & rev-list doc: stop promising libpcre " Ævar Arnfjörð Bjarmason
2017-04-08 13:25 ` [PATCH 08/12] grep: make grep.patternType=[pcre|pcre1] a synonym for "perl" Ævar Arnfjörð Bjarmason
2017-04-11 10:30   ` Jeff King
2017-04-08 13:25 ` [PATCH 09/12] test-lib: rename the LIBPCRE prerequisite to PCRE Ævar Arnfjörð Bjarmason
2017-04-11 10:31   ` Jeff King
2017-04-08 13:25 ` [PATCH 10/12] grep: change the internal PCRE macro names to be PCRE1 Ævar Arnfjörð Bjarmason
2017-04-11 10:35   ` Jeff King
2017-04-11 10:51     ` Ævar Arnfjörð Bjarmason
2017-04-11 10:53       ` Jeff King
2017-04-08 13:25 ` [PATCH 11/12] grep: change the internal PCRE code & header " Ævar Arnfjörð Bjarmason
2017-04-11 10:37   ` Jeff King
2017-04-11 10:45     ` Ævar Arnfjörð Bjarmason
2017-04-11 10:48       ` Jeff King
2017-04-11 11:02         ` Ævar Arnfjörð Bjarmason
2017-04-11 12:57           ` Jeff King
2017-04-11 16:51             ` Brandon Williams
2017-04-11 18:31               ` Ævar Arnfjörð Bjarmason
2017-04-08 13:25 ` [PATCH 12/12] grep: add support for PCRE v2 Ævar Arnfjörð Bjarmason
2017-04-11 10:43   ` Jeff King
2017-04-11 10:47 ` [PATCH 00/12] PCREv2 & more Jeff King
2017-04-15  8:11   ` Junio C Hamano
2017-04-15  9:50     ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170414212325.fefrl3qdjigwyitd@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=dark.panda@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=michal.kiedrowicz@gmail.com \
    --cc=noloader@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=trast@student.ethz.ch \
    --cc=vleschuk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.